Sudden constant 502s/504s when trying to access self-hosted retool

We are running a self-hosted instance of Retool with several apps on it. We've not done any development on it in some time and have not changed the set up for our hosting at all. The set up is exactly as described here

We've tried to restart all of the ECS tasks, have tried different browsers and machines but we keep getting 502s, sometimes 504s. There are no apparent errors in the logs for any of the ECS tasks or for the RDS instance. We've encountered this before and the errors seemed to have cleared by themselves after a few hours.

Any idea what could be causing this and how to fix it long term?

Thank you!

We've noticed that tasks under the RetoolECSservice keep restarting and have been doing so for about 3 hours now, with the following error message in the logs:

{"code":null,"kernelOutput":"Command failed: dmesg -T| grep -E -i -m1 -B100 'killed process'\ndmesg: read kernel buffer failed: Operation not permitted\n","level":"info","message":"[Master] Worker 41 died (code null, signal SIGKILL). 1 workers left","pid":41,"signal":"SIGKILL","timestamp":"2024-06-19T13:27:58.581Z","workersLeft":1}

Hi @Vlad_Ionescu That error indicates that your Retool deployment has insufficient memory. To resolve this, you'll need to increase the memory allocations for the tasks. Can you share the version of Retool you're running and your current resource allocations?

Hi, thanks, this worked!

If anyone else is affected by this issue, we used to run the main tasks service on a 2048 CPU, with 4096 memory and the job runner service on a 1024 CPU with 2048 memory. We doubled both and that sorted the issue out for now.

1 Like