We have a workflow that runs once per day and brings data from a source to a destination.
We chose to loop over the entities and upsert them to the destination, since the same entity can be brought up multiple days in a row.
We started noticing at some point that some days the workflow fails, but we're not sure why, as there are no error messages printed or something.
We set up 3 retry attempts per iteration and 3 retry attempts for the entire loop and that still didn't resolve it.
Would appreciate any thoughts or ideas around how we can find out the reason & prevent it from failing again.
Update
Solved for me: The workflow had a brach which send a response to the App. The problem was because in the cases the workflow run the branch that didnt had the response, the App would wait for a response and end with an timeout because there is no response. Solved with a response for all cases.
is there a chance using a 3rd party service might work to help tet some more logging info? Session replay integration
thats an example of how something similar could be added to a custom component for logging reasons, im still on vacation tho and am unable to see if theres a wayto implement somryhing similar in a workflow instead.
This issue still happens to us unfortunately. We tried many different approaches but we can't understand what's the reason for failure, as it's not a timeout or an error in the process.
It just "drops" the process in the middle, kind of.
These are our recent runs - as you can see we saw that it failed multiple 3 times in a row so we went ahead and manually triggered it - where it again, failed twice, but on the third attempt it suddenly succeeded.
The data itself is identical in the last 3 triggers so we can't see any reason why it would fail the first 2 times.
The first run failed after 56.87 seconds, the second failed after 43.256 seconds, and the successfull run took 101 seconds in total - so we can clearly see that the runs are dropped in the middle of the process...
Im currently facing the same issue with workflows (specifically when I have AI blocks).
When I run blocks individually, everything is working fine and nothing time outs.
However, when I try running all the workflow (from the app and the workflow bar), it just fails. Tried adding global error handling to debug the issue, nothing shows up.
Yep, this is exactly what I'm facing, and I found no possible way of debugging it or understanding why it fails.
The fact that the workflow is unstable is bad on it's own, having no way to debug it or understand what makes it fail is terrible.
I hope someone from the support team somehow gets to this topic, since I had no luck when contacting them by mail.
@Shai-D I noticed that AI blocks are the ones failing without reason. I was able to partially achieve my goal by using pure python along with open ai library. The downside is that you won't be able to leverage the Vector store.
it could be a pain in the butt, but you could use the OpenAI Assistants API to create an Assistant w access to the 'Knowledge Retrieval' and 'Function Calling' tools to achieve the same thing. since in the background Retool Vectors uses the Embeddings API ur results shouldn't be much different, if at all, so u can easily switch back to the Retool AI resource if u ever decide to.
Unfortunately I am not even using any AI blocks or something like that in my workflow, its a purely deterministic workflow, and it's not even that complex
Yuke has reached out to me by email, we will try to debug and understand what's going there and I guess one of us will update here if we have any smart things to add
So apparently the issue was that the workflow has run out of memory and crashed
We optimized the workflow to use less memory and generate less logs and stuff to try to reduce the memory usage and it now works.
Basically in the past we were iterating over ~10k records, upserting them in a loop one by one.
Now we are batching them, 10 batches of 500 records at once, which makes it more efficient