Retool self hosted - Keep getting "JavaScript heap out of memory"

Idan_Shavit · August 29, 2023, 11:40am

Hey, we are running the latest version of Retool self hosted (3.8.4) (using helm deployment deployed using terraform), and it seems that every time we run a PostgresSQL query we get an
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory although the query is taking less than 2 seconds to return all results (135 rows).

This is preventing us from using the product as we keep getting our Retool pod crash because of it.

Please help!

Punka · August 31, 2023, 2:24pm

How much RAM have you provided to retool?
Also check logs - it shows heap usage while idle.

Noa_Kurman · September 6, 2023, 11:44am

Hi @Punka ! I work with @Idan_Shavit so I'll answer
The server has 8G RAM but we also noticed the CPU is spiking
We created the instance based on retool's on prem tutorial (and the requirements there)
We drilled down and noticed the problem only occurs when entering the "edit app" mode, doesn't even have to do anything or actually edit for the spike to start - once clicking "edit app" we immediately see spikes in all resources until the server crashes.
We don't see anything helpful in the logs but we do see the service thats taking all resources is /usr/local/bin/node /retool_backend/bundle/main.js , we're using RDS for all DBs (both the internal retool DB and our own DB which retool is working with)

Any help would be nice We tried using all deployments we could, including eks and a dedicated ec2 instance. We also tried it with the internal DB as a postgres pod instead of our own RDS. Nothing works and this issue keep happening

Punka · September 11, 2023, 3:21pm

Hello @Noa_Kurman
truly dunno how to help, 8Gb should sufficient.
Is it pissible that some app reloads continuously and eats the RAM?
and CPU?
Or you have problem with every app?
We have very similar setup (EKS + RDS, previously it was EC2 + RDS) and works like a charm.
btw, could it be also RDS problem? I mean instance size for rds.

Noa_Kurman · September 12, 2023, 2:07pm

Thank you so much for the reply @Punka
The issue is even with apps that still doesn't have anything in them. Today we managed to edit one of the appa with no issues, but then moved to edit another app and the retool started acting up again.
I considered RDS issue, since we do see a heavy query every time this happens:
WITH tables AS ( SELECT table_name FROM information_schema.tables WHERE table_schema != ? AND table_schema != ? ....
But our DB is a t3 instance and doesn't seems to have any issues other than retool

If you have any more suggestions or ideas we'll be glad to hear

Thanks again!

Punka · September 12, 2023, 8:14pm

What's about RDS's metrics? t3 is burstable instances, is it possible CPU credits are run out?
Try to switch to next size (from t3.large to t3.xlarge for example) and see if the issue still here.

Noa_Kurman · September 13, 2023, 8:30am

Actually just tried running this query manually and it indeed takes too long and loads the DB server.
We have a lot of tables so Retool's query on our information schema might not be recommended for our use case, I wonder how to handle it from here forward. Of course I'm still not sure why a long running query causes a high load on the retool server, I guess it's related to their query limit

Noa_Kurman · September 13, 2023, 12:54pm

Just wanted to update here (And also feel like I owe @Punka an update ) - we solved the issue by reducing the limit of the schema query (using DATABASE_SCHEMA_QUERY_LIMIT var)
Seem to be caused by the number of schemas we have in our DB, and the fact that querying information schema is not always a best practice and not a very efficient process.
The query took too long and it caused the db-connector to die, not before killing the server (after increasing the server resources the connector died without killing it and then we could debug the issue a bit better)

Thanks again @Punka !

Punka · September 13, 2023, 4:35pm

Thanks for the update! Hope your solution will help someone