Categories
Mastering Development

Pandas advanced column creation from another dataframe

I have a dataframe like below:

df_detail = 
    car_brand   car_type
0   Toyota  Sedan
1   Toyota  Truck
2   Honda   Truck
3   Mazda   Sedan
4   Mazda   Convertible

I want to create a summary dataframe like below:

df_summary=
ID car_brand count_Sedan count_Truck count_Convertible
0 Toyota     1           1           0
1 Honda      0           1           0
2 Mazda      1           0           1

Is there a way i can create the count_ columns using pandas?
I tried the following:

import pandas as pd
d = {'car_brand':['Toyota','Toyota','Honda','Mazda','Mazda'],'car_type':['Sedan','Truck','Truck','Sedan','Convertible']}
df_detail = pd.DataFrame(data=d)
df_summary = pd.DataFrame({'car_brand':[]})
df_summary['car_brand'] = df_detail['car_brand'].unique()
df_summary['count_Sedan']=df_detail[((df_detail['car_brand']==df_summary['car_brand']) & 
                                     (df_detail['car_type']=='Sedan'))].count()

I get the error:

ValueError: Can only compare identically-labeled Series objects

ANSWER:

df_detail.set_index('car_brand')['car_type'].str.get_dummies().sum(level=0).add_prefix('count_').reset_index()

Leave a Reply

Your email address will not be published. Required fields are marked *