Categories
Mastering Development

Fast way to remove array of specific row values from 2D numpy array

I have a 2D array like this: a = np.array([[25, 83, 18, 71], [75, 7, 0, 85], [25, 83, 18, 71], [25, 83, 18, 71], [75, 48, 8, 43], [ 7, 47, 96, 94], [ 7, 47, 96, 94], [56, 75, 50, 0], [19, 49, 92, 57], [52, 93, 58, 9]]) and I want to […]

Categories
Mastering Development

Sort index of pivot dataframe in specific user defined order

This i my dfe :- ID CATEG LEVEL COLS VALUE COMMENTS 1 A PG Apple 428 comment1 1 A CD Apple 175 comment1 1 C PG Apple 226 comment1 1 C AB Apple 884 comment1 1 C CD Apple 288 comment1 1 B PG Apple 712 comment1 1 B AB Apple 849 comment1 2 B […]

Categories
Mastering Development

Overlap two shapefiles with geopandas in Python

I have these two shapefiles. [First shapefile with Provinces on link 1] [Second shapefile with Districts on link 2] I need to join/merge these two shapefiles to return a map as below: [![Mozambique Districts as Map below] moz_admin=1: https://drive.google.com/file/d/1MYNkuKFXP0dt76G9OgeBotjF5UmiRqEl/view?usp=sharing moz_admin_district=[2]: https://drive.google.com/file/d/1idf5VKgN8PZgdoAcBVEcY9pDa1WYpg7-/view?usp=sharing What I have done so far: import os import geopandas as gpd file = […]

Categories
Mastering Development

Iterate over rows and compare with another dataframe

I have two dataframes that I call df5 and df2. They look like the following: df2 df5 How can I ask python to look through all rows in df5 to see if the combination of Tågnr & Datum is available in df2, and if not: throw away those rows in df5. for i, row in […]

Categories
Mastering Development

Pandas merge two time series dataframes based on time window (cut/bin/merge)

Having a 750k rows df with 15 columns and a pd.Timestamp as index called ts. I process realtime data down to milliseconds in near-realtime. Now I would like to apply some statistical data derived from a higher time resolution in df_stats as new columns to the big df. The df_stats has a time resolution of […]

Categories
Development

Modify dataframe in function not propagated

I have a function to add extra columns to my dataframe, called as mpc(sym_time_data, 5, 30, 65). In the function, I loop through the arguments to create new columns, however, only the columns from the first run through are kept in the final dataframe – when I print out the columns at each stage, they […]

Categories
Development

Apache Spark: impact of repartitioning, sorting and caching on a join

I am exploring Spark’s behavior when joining a table to itself. I am using Databricks. My dummy scenario is: Read an external table as dataframe A (underlying files are in delta format) Define dataframe B as dataframe A with only certain columns selected Join dataframes A and B on column1 and column2 (Yes, it doesn’t […]

Categories
Development

matplotlib.lineCollection from pandas dataframe. Slow performance of current iterrows solution

I have a large dataframe which contains coordinates with a value. I want to plot this in matplotlib with a different color for each value. I have a working solution now that plots this as a lineCollection. I am using iterrows as that is easy to understand for me, but it is very slow. I […]

Categories
Development

How to loop through csv files and add columns? [closed]

The background of this problem is that I have 30 csv files, Jan 1 – Jan 30. I wrote a function that would clean one dataframne at a time somewhat automatically. Now I need to write another function that would process all the dataframes all together and add a column, ‘city’ to all 30 of […]

Categories
CSV Development

how to save each dataframe with a new name or save a dataframe as a csv in a for loop

for key in df: batch1=key.merge(df_quiz1,how=’left’,left_on=’studentName’,right_on=’Firstname’) batch1=batch1.merge(df_quiz2,how=’left’,left_on=’studentName’,right_on=’Firstname’) batch1.replace(to_replace =”-“,value =’0′) batch1.fillna(0, inplace = True) batch1.Quiz1_grade=batch1.Quiz1_grade.astype(‘int64’) batch1.Quiz2_grade=batch1.Quiz2_grade.astype(‘int64’) batch1[‘Quiz1_grade’]=batch1.Quiz1_grade*10 batch1[‘Quiz2_grade’]=batch1.Quiz2_grade*10 between50and60=0 lessthan50=0 between60and70=0 greaterthan80=0 between70and80=0 for i in batch1.Quiz1_grade: if ((i>0) and (i<50)): lessthan50+=1 elif ((i>=50) and (i<=60)): between50and60+=1 elif((i>60) and (i<=70)): between60and70+=1 elif((i>70) and (i<=80)): between70and80+=1 elif(i>80): greaterthan80+=1 else: continue _between50and60=0 _lessthan50=0 _between60and70=0 _greaterthan80=0 _between70and80=0 for i […]