Sample data to illustrate: import pandas as pd animals = pd.DataFrame({‘name’: [‘ostrich’, ‘parrot’, ‘platypus’], ‘legs’: [2, 2, 4], ‘flight’: [False, True, False], ‘beak’: [True, True, True], ‘feathers’: [True, True, False]}) name legs flight beak feathers ostritch 2 ✔ ✔ parrot 2 ✔ ✔ ✔ platypus 4 ✔ What already works Pandas makes it easy to […]
- Tags 'beak': [True, 'feathers': [True, 'feathers'}, 'flight': [False, 'legs': [2, 'parrot', 'platypus'], ('beak', {'beak'}) # Converts to: (animals.beak) & (animals.flight | animals.feathers) print(animals[more_than_beak]) name legs flight b, $gt, 2, 2) Checking any given boolean column is as simple as, 4, all animals that have at least a certain set of features, and at least one other return all_(value_columns) & any_(other_columns) elif comp == 'ge': # All the searched features, and compare directly against the resulting series of sets: def row_to_set(row): # Could be a lambda, and direct this is. Unfortunately, and it works... from operator import and_, and none other return ~all_(value_columns) & ~any_(other_columns) So if I want a condition to represent set(animals[features]) > {'b, and none other return all_(value_columns) & ~any_(other_columns) elif comp == 'le': # No other features retu, and set() of a series does the same for its values. (And you can't put unhashable items like Pandas series/dataframes into a set). I feel lik, and the result (a series of booleans) can be used to filter the dataframe with boolean indexing: bipeds = (animals.legs == 2) print(animals[, animals[animals.feathers]. What I'd like to do I'd like to be able to perform set comparisons against the collection of boolean columns: fi, apply's method of iteration is far, as set() of a dataframe creates an ordinary set of its column names, axis=1) return getattr(df_as_sets, but I don't think any of the solutions are applicable to the intended set-like behavior.) Update It occurred to me that I could use apply t, but I need different functionality.) Writing a function to do this is pretty straightforward: def comp_search(df, but is probably a reimplementation I can hardcode an equivalent boolean expression for each individual set comparison like this, but that gets really hard to read return set(label for label, COLUMN_NAME, column_names, comp, concise, each such condition is being parsed from a term in a text search string, Eq, etc. Extrapolating from earlier, f'__{comp}__')(set(values)) I like how clear, f'__{comp}__')(set(values)) Of course neither of these work, f'__{comp}__')(value) bipeds = comp_search(animals, false, False]}) name legs flight beak feathers ostritch 2 ✔ ✔ parrot 2 ✔ ✔ ✔ platypus 4 ✔ What already works Pandas makes it easy, far, Flight);, for instance, I can picture such a condition looking like this: set(df[features]) <= set(values) And such a condition could hypothetically be built like, if only I knew the syntax or method Pandas uses for it. What technically works, legs, like my above case-by-case boolean-conversion function. Is there a vectorized way to perform set conversion directly, or less than a set, or_ from functools import reduce def all_(sequence): return reduce(and_, row) if value) def set_comp_search_2(df, Sample data to illustrate: import pandas as pd animals = pd.DataFrame({'name': ['ostrich', sequence) def any_(sequence): return reduce(or_, sequence) def set_comp_search(df, since it sounds similar, slower that vectorized methods, so I need to construct them programmatically. (I'm aware of Pandas' query, so my question is: Does Pandas provide a way to achieve set-like comparisons against collections of boolean columns? (I've also looked at th, something roughly equivalent to the Series.str methods? (There's no Dataframe.set module...), true, value in zip(row.index, value): return getattr(df[column_name], values): column_names = set(column_names) values = set(values) other_column_names = column_names - values value_columns = (df, values): df_as_sets = df[column_names].apply(row_to_set, values): return getattr(set(df[column_names])