Categories
Mastering Development

Convert a PySpark Dataframe Column to a Python List depending on the value in another column

I have a dataframe "dfClean" with 2 columns:

+---+-----+
|som| ano |
+---+-----+
| 1 |  1  |
| 2 |  0  |
| 3 |  1  |
| 4 |  1  |

I need to create a Python list with those values in "som" that have 1 in the column "ano" on the same row.
So expected output is:
pyLst = [1,3,4]

In Pandas I have used:
pyLst = dfClean.som[dfClean.ano == 1].tolist()

How can I do this in PySpark or in Scala and what additional libraries do I need to import?

Leave a Reply

Your email address will not be published. Required fields are marked *