Categories
Mastering Development

scrapping multiple URLs with bs4

I am trying to compile patent files from the USPTO webpage with BeautifulSoup. df[‘link’] urls=df[‘link’].to_numpy() urls for i in urls: page = requests.get(i) ## storing the content of the page in a variable txt = page.text ## creating BeautifulSoup object soup = bs4.BeautifulSoup(txt, ‘html.parser’) soup however, it only prints one of the URLs, not all […]

Categories
Mastering Development

How to Put Categorical Data in Bins

I have the following Categorical Data: [‘Self employed’, ‘Government Dependent’, ‘Formally employed Private’, ‘Informally employed’, ‘Formally employed Government’, ‘Farming and Fishing’, ‘Remittance Dependent’, ‘Other Income’, ‘Don’t Know/Refuse to answer’, ‘No Income’] How do I put them in bins such that: [‘Government Dependent’,’Formally employed Government’,’Formally employed Private’] = 0 [‘Remittance Dependent’, ‘Informally employed’] = 1, etc… […]

Categories
Development

how to take date/time value from user input (calendar selection) in Python

I’m new to Python and I’m trying to capture my sensored data into a ‘totalList‘. This is what I did: # Initialize data frame df1 = pd.read_csv( “/Users/ME/Desktop/Frontend/sensor_points.csv”, dtype=object, ) df = pd.concat([df1], axis=0) df[“Date/Time”] = pd.to_datetime(df[“Date/Time”], format=”%Y-%m-%d %H:%M”) df.index = df[“Date/Time”] df.drop(“Date/Time”, 1, inplace=True) totalList = [] for month in df.groupby(df.index.month): dailyList = [] […]

Categories
Development

Python Typecasting in a list

I have a list of mixed datatypes ( strings and objects): list=[‘Buffet’, ‘Buffet’, ‘Buffet’, ‘Buffet’, ‘A la Carte’, ‘A la Carte’, ‘Buffet’, ‘Buffet’, ‘Buffet’, ‘A la Carte’, ‘A la Carte’, array([‘A la Carte’, ‘Buffet’], dtype=object), ‘A la Carte’, ‘Buffet’, ‘Buffet’, …] I want to replace this object type array item with just another string, for […]

Categories
Development

Extract the metric measurable in the product title

My Objective is to extract the metric measurable in the product title. Example: I have the following products with their titles: Product title A: “Milk 12KG 1Box” Product title B: “Apple 10Plus 256GB” Product title C: “Samsung 4G 3S” After spitting product title by white-space, I have this: import numpy as np arr = [np.array([‘Milk’, […]

Categories
Development

Return rows for customers only where values in a certain column are either x or y

I have a list of customer emails, and the status of their account at different dates. df = pd.DataFrame({’email’: pd.Series([‘john@email.com’, ‘john@email.com’, ‘mary@email.com’, ‘mary@email.com’, ‘patrick@email.com’, ‘patrick@email.com’, ‘foo@email.com’, ‘foo@email.com’],dtype=’object’,index=pd.RangeIndex(start=0, stop=8, step=1)), ‘date_created’: pd.Series([’18/04/2018′, ’19/04/2018′, ’18/04/2018′, ’18/05/2018′, ’12/05/2019′, ’15/05/2019′, ’12/08/2019′, ’15/08/2019′],dtype=’object’,index=pd.RangeIndex(start=0, stop=8, step=1)), ‘status’: pd.Series([‘Account Open ‘, ‘Account Closed’, ‘Lead’, ‘Account Open ‘, ‘Account Open ‘, ‘Account Closed’, […]

Categories
Development

Applying gradient styling to pandas DataFrame in multiple subsets

I want to apply color gradients(Green to yellow to red : based on the values) in multiple subsections of a pandas dataframe. In each of those subsections the values are going to be between 0 and 1. So far, what I have is : def applyMetricGradient(df, idx_pairs, low=0, high=0): def background_gradient(s, m, M, cmap=’RdYlGn’, low=0, […]

Categories
Development

Filtering time-localized index for hours interval in dataframe

I have a .csv like the following: ,columnA 2019-01-01 00:00:00-05:00,10 2019-01-01 00:05:00-05:00,10 2019-01-01 00:10:00-05:00,11 . . . . 2019-10-31 23:45:00-05:00,10 2019-10-31 23:50:00-05:00,10 2019-10-31 23:55:00-05:00,12 . pd.read_csv(‘myfile.csv’,index_col=0,parse_dates=True) Now I am trying to keep only the rows with the index between the hours 9:00:00-05:00 to 15:00:00-05:00. How can I get that knowing that the index is time-localized? […]

Categories
Development

Bert Embeddings Layer raises `TypeError: unsupported operand type(s) for +: ‘NoneType’ and ‘int’` with BiLSTM

I’ve problems integrating Bert Embeddings Layer in a BiLSTM model for word sense disambiguation task, windows 10 python 3.6.4 tenorflow 1.12 keras 2.2.4 No virtual environments were used PyCharm Professional 2019.2 The whole script import os import yaml import numpy as np from argparse import ArgumentParser import tensorflow as tf import tensorflow_hub as hub from […]

Categories
Development

how to extract strong tags from a column in a dataframe and append or replace that cell?

I have a dataframe with columns that has bold letters that i need to extract. there are 53000 rows and 27 columns that has strong bold letter. array([‘Candidate initial submission’, ‘The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>Candidate initial submission’, ‘The Candidate Status […]