Categories
Artificial Intelligence (AI) Mastering Development

Help improving my “read_excel” in python. My code reads slowly

My first question here so please bare with me.

I’m trying to feed my neural network with training data read in from an excel file. It works perfectly fine when i have less than 50 rows in the sheet. But when i try with the real excel file containing almost 4.000 rows it suddenly takes forever. Although 4.000 is a lot i’m pretty sure my way of doing it is still very inefficient.

as you can see in the code below i’m using the read_excel over and over again in the loop.
I feel like there should be a way to only read the whole column 1 time and then work with it from there.

My goal is to read in 5 rows as the 1st input starting from row 0. then reading 5 rows in again but starting from row 1 and 5 rows again starting from row 3
So it’s like a window of 5 rows that is read and then moving the window by 1.
The output should allways be the 1 row after the window.

**Example:** if row 1-20 contains numbers 1-20 then:
input1 = [1,2,3,4,5] and output1 = 6
input2 = [2,3,4,5,6] and output2 = 7
...
input15 = [15,16,17,18,19] and output15 = 20

notice how inputs are lists and outputs are just numbers. So when i append those to the final input & output lists i end up with inputs being a list of lists and out being list of outputs

My code

from pandas import read_excel

# initialize final input & output lists. The contents of the temporary input & output lists
# are gonna be appended to these final lists
training_input = []
training_output = []

# excel relevant info
my_sheet = 'Junaid'
file_name = '../Documents/Junaid1.xlsx'

# initialize counters
loop_count = 0
row_counter = 0

for x in range(25):

    # load the excel file containing inputs & outputs
    # using parameters skiprows, nrows (number of rows) and index col
    df = read_excel(file_name, sheet_name = my_sheet, skiprows=row_counter, nrows=6, index_col=0)

    # initialize temporary input & output lists
    input_temp = []
    output_temp = []

    for y in df.index:
        # append the first 5 rows of the 6 to input list
        if loop_count < 5:
            input_temp.append(df.index[loop_count])
            loop_count += 1
        else:
            # append the 6th data to output list
            training_output.append(df.index[loop_count])

    training_input.append(input_temp)
    row_counter += 1
    loop_count = 0

Leave a Reply

Your email address will not be published. Required fields are marked *