I have a large ETL task that requires pulling, transforming, and pushing 1200+ batches of data.
I am trying to architect it to use 2 workflows:
First workflow reacts to the trigger event and splits the data into batches, while invoking the second workflow for each batch with offset and limit parameters
The second workflow does the actual pull, transform, push
I have no problems with the second workflow, despite that it's limited with RAM and that's why I need 1200+ batches.
However, I have a huge problem with the first workflow.
I cannot schedule all 1200 batches at once (due to 50 concurrent runs limit).
When scheduling batches (invocations of the second workflow) one by one with a delay of 1 second (the cherry-picked optimal config), I don't have enough time to schedule all 1200 batches - the first workflow simply times out with "The workflow run exceeded the timeout of 900000 ms"
What the hell, Retool devs? It's simply impossible to run a heavy workflow! It forces me to manually duplicate "scheduling" blocks so that I can have all my sub-workflow scheduled. ...
I know you use AWS lambdas that have 15 minutes dead run, but at least you could make "loops" run each iteration in a new lambda invocation.
you could make "loops" run each iteration in a new lambda invocation.
We're actually working on rolling this functionality out in the next couple of weeks. If you're open to beta testing it, send me an email (dmitriy at retool) and I'll connect you with the engineer that's working on it!
Query blocks have a maximum timeout of ten minutes on Cloud, and self-hosted accounts can increase the timeout up to 40 min. The entire workflow does still have a limit of just over 1 day (30 hours)
Does the new feature Dmitriy mentioned sound like it might work for you @mawdo81?
Thanks @Tess so I'm not sure where on the same page here:
My query blocks run easily within 10 min that isn't the issue
My iteration block is the one that is timing out at 15 minutes overall
this is despite each iteration being well within 10 mins (usually counted in seconds or 1 or 2 mins)
So my issue isn't that each iteration is exceeding 15 minutes but that all the iterations together do.
I appreciate the opportunity to join the beta but tbh this wasn't an ongoing issue for me. It was an initial set up thing so I just added logging & re ran from a new start point a number of times until the work was done. Thankfully the called iteration doesn't time out at 15min, that continues until it's finished (presumably with a 10 or 15 min timeout).
Maybe, if this is the actual behaviour as it stands, the documentation should be updated to say "Each Workflow step times out after 15 min, with the exception of query blocks that have a max timeout of 10 min"?
Thanks for all the clarification! I did some research on this internally. There's a 15 minute limit in the code executor. It sounds like we're planning to make this limit configurable (so you could increase it) on self-hosted instances. We have plans to update the docs soon with more clear timeout info as well!