I have a Self Hosted Retool deployed on a GKE cluster. The deployment is working fine. However, when a developer creates a PR/Commit, it crashes for a minute and returns to a normal state afterward. I've seen this page describing it, but it doesn't show exactly how to solve it. Any idea how we should avoid it?
What’s likely going on is that when a PR or commit gets merged, Retool pulls in the changes and temporarily restarts or locks up part of the app during the sync process.
These might help:
- Scale your Retool web pods to at least 2 replicas, this is the biggest one. When there’s only one pod, any reload or processing (like Git sync) can take it down for a bit. With two or more, traffic can be routed to the healthy pod while the other reloads.
- Check that you're not mounting the Git repo directly inside the container, Retool handles Git sync internally. If you’re doing anything custom with volumes or mounts, it could be interfering.
- If you're on Retool Enterprise, you can run Git syncing on a separate worker, which keeps the main web server unaffected during pulls.
- Also worth checking: make sure your CPU/memory limits aren’t too tight, and that you’ve got proper readiness probes set up so a pod isn’t marked healthy before it’s actually ready to serve.
Hi Anggita, thanks for the answer.
Well, I've already reviewed the deployment based on your suggestions. All the items appear to be fine during deployment. However, I've seen on the logs that, sometimes, Retool fails to pool data from Github and it sounds like this is causing the issue:
- upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
Is there anything that can be done to avoid the outage when the github connection fails? Thanks!