Error 139 and memory limits

Hi, I'm running into "Runtime error: exited workflow with code 139" with my workflow:

  1. Query database (~50k rows)
  2. Run a transform function on each row (JS code in a Loop block)
  3. Write transformed results to other table (GUI block using 'Bulk insert records').

I see a few other topics about this with no conclusion, but afaict this is related to memory usage in the JS code block. I thought using the built-in loop block (rather than a direct .map() on the query results) would help here, since I would assume it doesn't attempt to materialize the entire result set in memory at once, but it doesn't seem to change anything.

I found How to use the loop function to solve my data issue - #3 by Sander which suggests chunking; I implemented chunking slightly differently and still ran into the same issues (my chunking approach paginates the initial sql query rather than directly using _.chunk on the result set).

Am I using loop incorrectly, or is there a fundamental limitation in retool here?

Thanks!

note that this workflow works perfectly on my staging db of ~500 rows.

Hello @RE2L!

Thank you for letting us know, we have some other users who have also been mysteriously getting 139 Error on their workflows. I will let the eng team know about your case and keep you updated.

Chunking is definitely the best practice for large amounts of data and can be very useful for reducing memory load and improving workflow performance.

It sounds like you are using the loop correctly, as the built in loop block is best practice as opposed to using a .map in a JS code block, but I would need more details/screenshots to fully confirm.

Retool enterprise users have been able to work with very large amounts of data so I do not believe it is a fundamental limitation. Hopefully we can get to the bottom of the 139 Error as a bug we can fix quickly.

We can also try to live trouble shoot during office hours!

Thank you for the quick response. I'd like to understand loop blocks and chunking a bit better. I was under the impression that the loop block was designed to stream individual data rows from the input query so as not to load the entire result set into memory at once (as .map() would do). If that's the case, why does chunking the input query results help? It's just adding complexity to my workflow and I don't fully understand what the point is. And if the loop block doesn't stream the inputs, then chunking again doesn't help because my workflow would still load all of the chunks into memory at once?

Under the assumption this was related to memory limits, I simplified the code a bit and limited the amount of data I'm ingesting, but I'm still running into limits before processing all 50k rows. While these changes always work when I have less than 2000 items, I've been getting slightly different errors each time when running on larger inputs. In one of my runs, I got a slightly different error: "The workflow run exceeded the memory limit of around 1GB".

It's unclear why so much memory is being used and I'm not sure how to debug this further either. I may try to attend an OH though the timing is tricky for east coast folks. Providing a few more details here in case it helps:

  1. My input is ~50k database rows. There are many columns (including some very large text columns) in the table, but I'm specifically selecting on 4 columns that should contain no more than 1KB of data total per row.
  2. My transform is implemented as a 'function', called using 'await' in the loop block. It looks like the loop block lambda itself is also implemented as an async function.
  3. The transform itself doesn't do anything too crazy. It does run a few regexes against the input and generates a few strings as output for each row, but the per-row output should be O(100 bytes).

NVM, misread the time on the office hours. will do my best to join!

Hello @RE2L!

I heard from my coworkers you were able to make it to office hours, I unfortunately was out of office and missed it :smiling_face_with_tear:

Let me know if you were able to get things working! You are correct that batching really doesn't do a whole lot, as soon as the SQL query runs all the data is saved in memory. The loop block under the hood is just a Javascript for loop :sweat_smile:

Using LIMIT and OFFSET to reduce load size and then recalling the workflow on the next x range of numbers is the best way to avoid any memory issues or other timeout errors.

Even with simple transforms the data is being duplicated and more or less doubled from the original input :melting_face: But as long as you can use SQL to shrink the number of rows it should work smoothly!