Python Pandas in Workflow - Parsing Excel in Retool Storage

Goal: Parse an excel file uploaded to Retool storage eliminating some unnecessary columns / rows, transforming a date field, and returning the data to a table in my app.

Steps: I have a multi-page app that includes the ability for a user to upload an excel report from a customer for processing into a Postgres database (hosted in AWS). The button that captures the file has an event handler for uploaded to storage that calls the workflow I've built to handle the file parsing.

In the workflow, I have an inbound webhook setup, and the file is correctly identified post upload to Retool storage. The workflow is failing at the step it's attempting to read the uploaded file with the error "AttributeError: object has no attribute 'file'" even though the input clearly shows the file with the relevant data.

Details:
Here's the error:
image

Here is the code being executed in Python by the workflow step parseReport:

# Retool Workflow to process an uploaded Royaly Report with Python Pandas and return the data back to a table in the front end. 
import pandas as pd

# Read the Excel file, skipping unnecessary rows and setting the 18th row as the header. startTrigger is the name of the inbound webhook in Retool Workflow that is called to execute this code.
df=pd.read_excel( {{ startTrigger.file[0].fileId }},sheet_name='ReportForm',skiprows=17,header=0)

df.drop(columns=['A'], inplace=True)

# Convert the "Reporting Period (Month/Yr)" column to datetime, assuming it's in 'MM/YY' format
df['Reporting Period (Month/Yr)'] = pd.to_datetime(df['Reporting Period (Month/Yr)'], format='%m/%y')

# Set the day to the last day of the month
df['Reporting Period (Month/Yr)'] = df['Reporting Period (Month/Yr)'] + pd.offsets.MonthEnd(0)

# Convert the datetime back to the desired format 'MM/DD/YYYY'
df['Reporting Period (Month/Yr)'] = df['Reporting Period (Month/Yr)'].dt.strftime('%m/%d/%Y')

Screenshots:
Workflow Libraries:
image

parseReport input showing file information:
image

Retool Storage showing file:
image

I have tried using the file name as well as the fileID to no effect. I am using a workflow to do this processing, rather than "in app" because I would like to extend this functionality in the future with more automation around the inbound file processing.

I have a time-sensitive project tied to this function - I'd really appreciate some quick help / insight here. Thanks.

I think you need to reference {{startTrigger.data.file[0].fileId}}

The error you are seeing is basically saying "startTrigger" has no "file" property.

Thank you for the feedback; I think I already tried that but I will give it a shot and see if that fixes the issue.

Well now I'm getting a different error, so.. progress! :stuck_out_tongue_winking_eye:

Data tab of the workflow run log for the parseReport step:
image

@Adam_Cogswell it appears that the data in your input isn't quite the excel file just yet -- if line 5 is:

df=pd.read_excel( {{ startTrigger.file[0].fileId }},sheet_name='ReportForm',skiprows=17,header=0)

I am assuming that you need the actual file and not its ID (fileId). What is the form of the file data coming out of Retool storage? It is possible you'd need to open/read/parse the base64 data if that is what is coming through your webhook.

Hmmm, ok. I guess I thought passing the file through to Retool storage first would mean I could reference the actual file. Maybe that's what I'm missing here. To answer your question, the data comes back from the startTrigger in base64, yes, but also has the file name and file ID from retool storage (which is likely what's confusing me). Here's a screenshot of the Inputs tab for the parseReport step where the pandas code is being executed:

Hey @Adam_Cogswell -- I think you can just import base64 and then use

df=pd.read_excel(base64.b64decode({{startTrigger.data.file[0].base64Data}}), sheet_name='ReportForm', skiprows=17, header=0)

...which should satisfy the read_excel input requirements. Not certain about that last bit as-is, but the pandas docs have the relevant structures to use just in case it needs to be modified further before being usable.

1 Like

Unfortunately, that does not appear to work either. :frowning:

I did import the base64 library and tried making the "df" line as simple as possible, too, removing all parameters and just passing in the base64 data as you suggested. It still comes back with the "unhashable type: 'set' " error on line 6. Maybe also of note - this might just be a linting bug - it does not seem to like the file parameter for the data selector on startTrigger:

Really appreciate all your help so far @pyrrho. Maybe someone from the @retool_team can shed some light?

I spent some time this morning doing some more troubleshooting and attempting to make this work. Here's what I've tried:

  • Added a workflow step to get the file contents from retool storage (hopefully eliminating any issues with the upload or bad / missing file paths)
  • Added a slight delay to the invocation of the workflow call on the file upload button to be sure the file gets to it's destination (Retool storage) before the workflow runs
  • Changed tactics slightly to try decoding the raw base64Data from the file rather than use the pandas read_excel function

Here's the interim Retool Storage step:
image

Here is the updated python code that attempts to just decode the base64 data from the file in Retool Storage:

# Retool Workflow to process an uploaded Royaly Report with Python Pandas and return the data back to a table in the front end. 
import pandas as pd
import base64

# Read the base64 data from the file uploaded to Retool Storage into a pandas dataframe.
df=pd.DataFrame(base64.b64decode({{ storeData.data.base64Data }}))

It continues to fail at the point it's trying to decode the base64Data with the following error:
TypeError: unhashable type: 'set' (line 6)

Any chance I can get some help here? @Community-Support-Engineers

Is this something where you need to convert the base64 to a ByteArray for the DataFrame?

import io
import pandas as pd
import base64

b64decoded = base64.b64decode({{ storeData.data.base64Data }})
df=pd.DataFrame(io.StringIO(b64decoded.decode('utf-8')))
1 Like

I will give this a shot, but i was using Pandas because I thought it was supposed to handle the dataframe natively from the excel file. :person_shrugging: At this point, it can't hurt! Appreciate the feedback.

1 Like

Same error, "unhashable type: 'set'". I'm going to try a completely new file in a different format (csv) and see if I get better or different results. I've been testing with the same, production, file - maybe it's something in the sheet itself. Really appreciate you being willing to help @pyrrho. :slight_smile:

1 Like

Tried with a new, simple CSV and using the pandas read_csv function without any additional flags:

df=pd.read_csv({{ storeData.data.name }})

I got the same error:

TypeError: unhashable type: 'set'

Very strange.

My suggestion: Load this CSV into table data in your app and then send the table data to your workflow or stage it in a DB that you can call from the workflow.

I use a Retool DB and a SQL DB to read directly into a DataFrame after pulling the table data in a python workflow node like this:

image

If your startTrigger data is already coming parsed from a table then you shouldn't need to worry about the file type or encoding because it will already be a structured array for the DataFrame

1 Like

I don't know how that last reply got marked as a solution, but it's not. It does not address the errors I'm receiving or answer the questions I originally posed. That's a work around, at best, and it doesn't even apply here - I am not using a separate database to further manipulate a data set that should just work given the tools at hand. If this is a bug, then someone just needs to let me (and the community) know so we can find other solutions.

@Community-Support-Engineers - any insight here? Has anyone attempted to replicate this issue?