Crash Loop in Retool Workflows Worker (Temporal Connection / DNS Issue)

We are experiencing an intermittent issue with our self-hosted Retool (on-premise) deployment where the workflows worker container enters a crash loop. The issue occurs sporadically and requires manual intervention (removing the container and re-running docker compose up -d) to recover.

Environment

  • Retool self-hosted (on-premise, Docker Compose)

  • Temporal-based workflows enabled

  • External PostgreSQL database

Logs:

/retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_@swc+helpers@0.5.3_metro@0.80.9_encoding@0.1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/connection.js:58
                throw new core_bridge_1.TransportError(err.message);
                      ^

TransportError: tonic::transport::Error(Transport, ConnectError(ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Temporary failure in name resolution" })))
    at NativeConnection.connect (/retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_@swc+helpers@0.5.3_metro@0.80.9_encoding@0.1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/connection.js:58:23)
    at async $W (/retool_backend/bundle/main.js:2959:17389)
    at async uts (/retool_backend/bundle/main.js:11238:9746)
    at async Object.GsT (/retool_backend/bundle/main.js:11238:14445)

Node.js v20.18.1