Workflow seems to time out when processing a larger amount of data

I have what seems like a pretty straightfoward workflow with a moderate amount of data that eventually hangs the browser and does not complete.

My workflow grabs about 20k rows from a Google Sheet, performs a loop on that data to rename some of the field names and then does another loop to upsert the data into the default Postgres managed_db.

When I run this indiviudally (via the run icon/button above each step), grabbing the data only takes a few seconds to complete, but then the second loop to rename fields gets stuck with the loading spinner and eventually Chrome tells me that it looks like the page hung and if I wanted to wait or exit it.

The loop in this step just calls a function:

const columnNameMappings = {
  'Bundle ID': 'bundleId',
  'Campaign ID': 'campaignId',
  'Campaign Name': 'campaignName',
  // ... more column names
};

const { Date, ...rest } = row;

return {
   Date: moment(Date).format('YYYY-MM-DD'),
  ...Object.keys(rest)
        .reduce((acc, field) => ({
            ...acc,
            [columnNameMappings[field] || field]: row[field]
            }),
            {})
};

I ran this on a smaller scale table with a few rows and it worked fine. Is 20k rows just too much data to run in the browser? Each row has 12 fields, and when you download the sheet as a CSV, it's 2.8 meg in size.

Presumably when I set this up as an active workflow, it'll run on the Retool backend and might be more successful.

Hi @joostschuur appreciate you sharing this here and will do my best to help out -- based on what you've shared, it seems that the issue is due to your frontend memory getting full and crashing. The workaround I'd suggest is to have two workflows, an orchestrator one that takes in the records on google sheets, does the loop, and simply fires off the 2nd workflow, which would do the renaming and the upsert. Hope this helps, and isn't too cumbersome of a workaround. :sweat_smile: Please let me know if there's anything else I can help with!

1 Like

Thanks for that tip, Alina. From my point of view, it seemed like a pretty quick blocker to hit with Workflows, and the amount of data used here didn't seem like that much. Then again, I've also run into problems in Node parsing larger data sets and had to take a streaming approach to processing the data there.

Once a workflow is deployed and runs on a schedule, does the backend system, that runs them have more resources available than when this runs during development in my browser?

At this point, I'm probably going to handle this part of the data pipeline with a serverless job of my own, just have some more fine tuned control over the process. I can certainly see Workflows being useful for other situations though.

yes, it does!