I have a workflow that is triggered from a schedule. The workflow does some ETL of telemetry data at 3am in the morning. It always fails after around 15 minutes.
If I kick it off via the workflow editor later in the morning, it finishes successfully (the workflow effectively kicks off from where it left off until it processes all the data). This manual kick off takes around 5 mins - and there's no issues.
My suspicion is that the workflow is hitting a timeout. BUT, there's nothing in the log about that, and from what I've read the timeout should at least be several hours given that this is a scheduled workflow with no webhook responses etc.
I have a global error handler sending me an email, but all I have is the workflow context (see below). Important to note that there is a loop in the workflow, which iterates over a workflow function. Its during this loop that the workflow fails (as this is where all the work is done) - but I can't "see" whats going on in there as I cant put an error handler into the function itself.
What is the loop doing, generally speaking (or specifically if you can!)? Is it calling an API or something that might timeout or be out of service at 3am for maintenance (or whatever). If you move the schedule forward or back an hour and it works, it might suggest something is going on external to the workflow (that is being used in the loop).
"Retool limits the length of time a workflow run can remain in execution before it is automatically terminated.
For asynchronous workflow runs, the timeout is 30 hours. For asynchronous workflows with User Task or Wait blocks, the timeout is 60 days.
For synchronous workflows runs, the timeout is 15 minutes up to executing the first webhook Response block. The remainder of the workflow follows the asynchronous timeout."
Hi @maxamillian,
A couple thoughts. Could you even temporarily try putting the functions in a JS block? This would give you the visibility necessary if it still fails at around 15 minutes.
Another idea would be to break it into a couple of workflows or limit your loops so you finish (partially) the workflow in say 10 minutes and trigger it again a little later.
Can you please share a screenshot or two of your workflow and any functions?
Hi @maxamillian,
If you send me the json for your workflow, I can try to help you further debug this.
However, you should know that all loops are subject to the synchronous execution timeout, which is 15 minutes.
@maxamillian,
Yes, the time limit applies to the loop regardless. So if you go with a parallel execution, the timeout applies to the whole block and not just each iteration as it does with sequential.
@maxamillian,
Here is a possible workaround: You could extract the blocks from your function into a workflow and use Run workflow from workflow resource as your loop lambda. Each workflow would be scheduled asynchronously and could run over multiple activities. However, you wouldn't be able to return a result or wait until all iterations are completed (so remove your response block from the end of this workflow). You could, instead, send your email from within the second workflow. Set your loop up to run in batch mode.
Let me know if this works for you and solves your timeout issue.
Ive gone with increasing the max db connections and a batch loop, and thats working well now - around ~9 minutes, so below the timeout! That may grow over time though... Regarding having a seperate work flow for the loop: I was initially doing this, but was wary running too many workflows (and having to pay for extra workflows for the month). Its around 800 iterations every day. In any case problem solved, for now... Thank you for your help!