500 when creating a new commit or branch

I have a Self Hosted Retool deployed on a GKE cluster. The deployment is working fine. However, when a developer creates a PR/Commit, it crashes for a minute and returns to a normal state afterward. I've seen this page describing it, but it doesn't show exactly how to solve it. Any idea how we should avoid it?

What’s likely going on is that when a PR or commit gets merged, Retool pulls in the changes and temporarily restarts or locks up part of the app during the sync process.

These might help:

  1. Scale your Retool web pods to at least 2 replicas, this is the biggest one. When there’s only one pod, any reload or processing (like Git sync) can take it down for a bit. With two or more, traffic can be routed to the healthy pod while the other reloads.
  2. Check that you're not mounting the Git repo directly inside the container, Retool handles Git sync internally. If you’re doing anything custom with volumes or mounts, it could be interfering.
  3. If you're on Retool Enterprise, you can run Git syncing on a separate worker, which keeps the main web server unaffected during pulls.
  4. Also worth checking: make sure your CPU/memory limits aren’t too tight, and that you’ve got proper readiness probes set up so a pod isn’t marked healthy before it’s actually ready to serve.

Hi Anggita, thanks for the answer.

Well, I've already reviewed the deployment based on your suggestions. All the items appear to be fine during deployment. However, I've seen on the logs that, sometimes, Retool fails to pool data from Github and it sounds like this is causing the issue:

  • upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused

Is there anything that can be done to avoid the outage when the github connection fails? Thanks!

Thanks for reaching out, @Marcus_Santos. Can you let me know which version of Retool you're currently running? And does this consistently happen every time your instance communicates with Github? Does the commit and/or PR still go through, even if your instance crashes?

You mention that you've reviewed the logs, as well. Do you mind sharing the relevant portions? Feel free to DM me. The jobs-runner typically handles all source control, so I would expect to find the relevant logs there.

Hi @Darren , sorry for the late response. At the end, it turns out that I updated the retool version from 3.148.5-stable to 3.196.10-stable and it started to work pretty fine. Thank you so much for the help.

1 Like