Unknown cause of failures

  1. My goal: Understand why the workflow failed

  2. Issue: 2 separate failures occurred
    - one failed but there is no sign of failure (looks like the workflow completed successfully. Error handler was not run)

         - the other one failed and error handler ran but no other blocks ran
    

  1. Steps I've taken to troubleshoot: Could not find anything in the forums
  2. Additional info: Self-hosted

Same here! Watching..

Same here, also self hosted. Did you find anything to solve the issue @Nathan_Wong?

No, still waiting for Retool support to reply to this post

The same problem here.

Hi everyone,

Let me check with the workflows team to see what other options are available for troubleshooting this issue. Apologies for the inconvenience :sweat:

For users on the cloud, if you can share workflow ids and workflow run ids we can check our logs to see if there are more details we can find to clear things up.

For self hosted users, can you share how your deployment is set up? The answer to that will determine which steps from the set up docs here I would recommend for you to double check that you have set up correctly.

If you can share a dump of your workflows container logs that I can share with our team we can better help to troubleshoot!

@Jack_T w-id 2b48aad6-0414-4487-a2cb-b49ee903edec
w-run-id 0197c599-0c99-7542-adda-910a1b5abea8

Hi @helio.costa thank you for sending me those. It looks like the workflows ran successfully from the server logs, which leads me to believe this is a frontend issue where the workflow logs in the app are incorrectly displaying a failed run when they work properly.

We also just pushed an patch to cloud that should have fixed this issue with workflows. If this is still happening for anyone let me know!

Sometimes it still happens. Others take ages to load the log.

When you say it still happens, you are saying that randomly some workflows will show up as failing to run in the run logs but on inspection, all the blocks have completed successfully?

Are you able to add a response block to catch any errors?

The logs should appear once the workflow finishes, and once they are visible you should be able to see how much time each block too up. Is it taking longer for the logs to loan than the total time of all the steps in the flow?

like, this one, doesn't show anything. It's very random.

Thanks for sharing the screenshot @helio.costa,

That is very odd :thinking: can you share with me the workflow ID and run ID for when the logs are empty?

One possible cause is memory limits, any chance this workflow is highly memory intensive?

Another possible reason is Webhook Return Blocks ​ since Workflows with webhook return blocks don't upload logs until completion (for lower latency). If the workflow crashes before completing, logs won't be uploaded and will never be accessible.

1db18498-532d-464e-b73d-85155801d65a
runId=019807f8-bef7-7228-87b3-cee918a932dc

Thank you for sending me that @helio.costa!

Just got word back from our workflow engineers, they believe the issue is related to a pr that was a recently merged.

"We now skip loading logs from s3 blocks where the total size is > 20MB because it’s slow and uses resources on our side + it’ll likely crash on their side"

From the engineer looking at your logs, "And because it looks like all their logic is in 1 subflow (which I believe is stored in 1 s3 block) nothing ends up showing up".

The engineers are now working on adding in a warning message to let users know when this is occurring, so that the app will tell you when logs are being truncated.