How to create sequences out of a dataframe and put them in an array of arrays or a list?

For the input of: df = pd.DataFrame(np.array([[1, “A”],[2, “A”],[3, “B”],[4, “C”],[5, “D” ],[6, “A” ],[7, “B” ],[8, “A” ],[9, “C” ],[10, “D” ],[11,”A” ], [12, “A”],[13, “B”],[14, “B”],[15, “D” ],[16, “A” ],[17, “B” ],[18, “A” ],[19, “C” ],[20, “D” ],[21,”A” ], [22, “A”],[23, “A”],[24, “C”],[25, “D” ],[26, “A” ],[27, “C” ],[28, “A” ],[29, “C” ],[30,…

Details

String Matching a list of names

I’m working on my first big project trying to learn more about string matching. I am trying to match a list of names based on this article: https://bergvca.github.io/2017/10/14/super-fast-string-matching.html import pandas as pd import re from sklearn.feature_extraction.text import TfidfVectorizer names_short = pd.DataFrame([“gogle”,”bing”,”amazn”,”facebook”,”fcbook”,”abbasasdfzz”, “zsdfzl”,”gogle”,”bing”,”amazn”,”facebook”,”fcbook”, “abbasasdfzz”,”zsdfzl”,”google”,”bing”,”amazon”,”facebook”], columns=[“name”]) def ngrams(string, n=3): string = re.sub(r'[,-./]|\sBD’,r”, string) ngrams = zip(*[string[i:] for…

Details