Workflows consistently crashing while editing

Want to start by saying I love the product.

However, I am currently blocked due to the page consistently crashing whenever I try to edit a workflow. Despite the workflow running successfully via cron job on a daily basis, when editing I am unable to run through all existing blocks to begin working on a new block. I have experienced latency and occasional crashes editing workflows over the past month or two, but I'd typically be able to run through each block after a few attempts. Recently, this is no longer the case.

I've tried running on incognito and clearing cookies. I'm working on a tryretool domain in case this is relevant, as I know there have been some tryretool specific occurrences in the past. Really appreciate any help here as I am completely blocked at the moment.

Hi @Louie_Ayre,

Can you share a bit more information about the issue you're experiencing? Screenshots / screen recordings would be helpful. Do the issues occur after a specific type of action or can you not access the editor at all?

Eric

1 Like

have you tried pressing f12 to open the browser dev tools then going to the network tab. once your on the tab, refresh the page and do something to crash it. if you kept the dev tools open you should see the Console. in it if you start scrolling around anything red is bad. there are times where retool stuff will show up in red there (nothing you can do about it, that's on their end). You can also check the network stuff right above the console. scrolling up, if you see anything in red it could be a failed query or js that's causing an error making retool crash. I've had this problem w a module that refused to display any components. they were listed in the Component Tree, but nothing would render. it was a query causing the issue running onload, the query was valid but somehow retool evaluated it as a circular ref/call. i could only figure it out w the browser dev tools tho.

couple tips:

  • in the console you can click the blue text to the right of any output line and it will show you the code.
  • in the network view, if you click on a network request you can look at the headers and payload to help figure out what query broke things
1 Like

Thank you @bobthebear for the incredibly helpful comment. Getting the following error upon opening the page:

(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.

@ehe I'll post screenshots / screen recordings later today!

so if it's a memory leak we need to figure out if it's from a query running, js somewhere or a freak backend bug.

the EventEmitter stuff is from Retool, unless you're adding EventListeners, using redis or WebSockets (custom components) or maybe from using .bind(window) somewhere.

in general, this error is usually thrown when an event has too many listeners registered to it (max is 10). if you are using listeners, you could get this problem from not removing any of the listeners or from something is stuck in a loop continuously trying to add an event listener.... like a query being constantly called because the Disable code isn't correct: query1 adds a listener then calls query2 who sets a variable. query1 triggers on that variable but should be disabled based on it's value.... if that Disable setting is wrong, you get stuck in an infinite loop of doom and u'll reach 11 listeners on 1 event REALY quick

Edit: I forgot about .eval(), it could potentially cause this also. if our js is running in 'strict mode' then this wouldn't apply, but if it is then depending on the variables in your eval it could possibly cause this also? its a bit of a guess, i've always avoided eval so i haven't used it much

Thanks so much @bobthebear for this context. We are not adding any EventListeners on our end, or using any of the related functionality that you mentioned.

In fact, @ehe I was just able to reproduce this error in a fresh Workflow without including any of our own code or even referencing any of our resources. While adding in Resource Query / Code blocks, allowing the default blocks to persist, the same error consistently comes up upon adding the 4th block. I have been able to replicate this using Chrome and Safari.

Please see the attached screen recording.

Thanks for sharing that video. The EventEmitter leak does look problematic for sure, and we will take a closer look into that. Does the error prevent you from performing any actions in the editor? You mentioned that you experienced the editor crashing before - are you able to reproduce that consistently?

@ehe no problem! While no specific action is prevented in and of itself, the page crashes after short period of interaction. We are consistently unable to run through each block of code without the page crashing. For some workflows, chrome provides the opportunity to wait for the page to respond, allowing us to sometimes eventually run through each block. But for other workflows the crash occurs without any prompt to wait for the page to respond, which is a complete blocker in terms of making updates.

Is it fair to say that you are experiencing crashes after running resource query blocks? What type of resource query are you running and what type of data are you expecting back?

Maybe it would be faster if I could inspect the workflow myself. If you don't mind inviting eric@retool.com to your organization and pointing me to the problematic workflow

it gets weirder imo lol.

image
click and drag the Resource Query and drop it in, but don't connect it to anything. do this 3 more times so you have 4 total (where it starts dropping that error). notice this time it didn't show up. refresh the page.... now the error shows. hit the back button to navigate back to the workflow page, then click to edit the same workflow. no error. refresh again though, there's the error.

I went back and made a new workflow then followed your video. if you don't refresh at any point, the error never shows up. the second you hit refresh though, you'll see that error. to make it go away, I've found the only thing to do is hit the back button then forward (or just cick the workflow again).

So What's Going On?
since addEventListener is used, not reacts onEventName prop stuff, a wild guess would be that somewhere there is a componentDidMount(), that adds event listeners, who's matching componentWillUnmount() is missing a removeEventListener. if the element/object isn't actually destroyed from a page refresh after not using removeEventListener, then when componentDidMount() gets called again this problem could occur. a page navigation would def destroy the element and the JS Environment..... with React though, a page Refresh isn't necessarily the same as a page Reload. the page reload will always destroy the JS env and components, but with a refresh it's possible to persist the state even after the refresh. this is the only thing i can think of that's going on and it's the only thing I can think of as to why the listeners are destroyed on naviagation but not on refresh (especially since w just plain JS a page refresh would always destroy the env and components, so the only difference has to be w how react works)

Edit:


it's rather difficult to figure out what the code is for, but i can make another guess based off of the other switch cases: console, dom, xhr, fetch, history, error, unhandledrejection. It's the 'console' and 'history' that give hints. the normal console, and the whole browser dev window doesn't have anything named history, logs sure, but not history.... so it's not the dev window console being referred to for redirecting output. there is somewhere else with both a console and somewhere w a history label.... and even an area for xhr/fetch results. its the run history window. it would need all of those switch cases. also, convienetly, it would be nice if that window persisted state between refreshes.... oh what do you know, it does lol.

hopefully this helps you @ehe or whoever ends up trying to fix this

@ehe We'll go ahead and send an invite!

For some more context, our workflows are composed of Postgresql resource query blocks and JS code blocks. Leading up to a crash we experience latency, and pretty sure I've seen crashes occur during / immediately after either type of block.

I've renamed the workflows:

Workflow #1 - this workflow runs successfully via a cron job on a daily basis. After a brief period of editing, we begin to experience latency. Soon after, chrome prompts us to wait for the page or exit. After choosing to wait repeatedly, we are sometimes eventually able to run through each block of the workflow.

Workflow #2 - this workflow is a work in progress, and is not yet triggered on a regular basis. This workflow seems to be quicker to crash, and we are not prompted with the option to wait for the page, so we are completely blocked here.

Following up here, we resolved this issue offline. If anybody is experiencing similar issues, we are working on features to virtualize the workflows editor to improve performance. Also, minimizing the sizes of your blocks and relying on the block code editor's scroll bar instead can also help.

@ehe is there any update on the work to improve the workflow editor performance? My editor lags pretty heavily when working with larger files (~20MB of data, moving to a dataframe for manipulation, then importing to a DB) and will regularly crash with an "out with a memory" error.

Are there any blocks in particular that you notice start to lag the editor when there is a lot of data in them? Also, what is the name of your organization? I can opt you under the experimental virtualization feature to see if that helps. Feel free to DM me if that's better.

Thanks! Sent a DM with the details.

Hey @ehe, I have a similar problem with handling bigger files in workflows. Especially when handing over base64 strings, they tend to blow up. I would be also interested in the virtualization feature.

Also one more question:
Does self-hosting retool allow increasing the workflow memory limit of 250 mb?

@jpmin as an update regarding your question for self-hosted Retool workflows, I learned from our Workflows team that if you upgrade to the latest Stable release (v3.33), going forward by default there will be no memory or cpu limits for running workflows in the worker. If you want to enforce some limits, you can set WORKFLOW_MONITOR_PROCESS_ENABLED to true. Then, the default limits will be 1.5GB (1536MiB) memory and 8 cpus, which can be customized by setting WORKFLOW_MEMORY_LIMIT_MIBS and WORKFLOW_CPU_LIMIT respectively.

In general I know the team is also working on adding some more documentation detailing what limits are in place and what's configurable in the near future!

1 Like

Hey @kbn,

thanks! These are great news. Unfortunately, I ran in the problem that I can't sign up with Temporal. I posted about it here: Self-hosted: Failed to enroll with Temporal

Any idea what the problem might be?