We are experiencing an intermittent issue with our self-hosted Retool (on-premise) deployment where the workflows worker container enters a crash loop. The issue occurs sporadically and requires manual intervention (removing the container and re-running docker compose up -d) to recover.
Environment
-
Retool self-hosted (on-premise, Docker Compose)
-
Temporal-based workflows enabled
-
External PostgreSQL database
Logs:
/retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_@swc+helpers@0.5.3_metro@0.80.9_encoding@0.1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/connection.js:58
throw new core_bridge_1.TransportError(err.message);
^
TransportError: tonic::transport::Error(Transport, ConnectError(ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Temporary failure in name resolution" })))
at NativeConnection.connect (/retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_@swc+helpers@0.5.3_metro@0.80.9_encoding@0.1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/connection.js:58:23)
at async $W (/retool_backend/bundle/main.js:2959:17389)
at async uts (/retool_backend/bundle/main.js:11238:9746)
at async Object.GsT (/retool_backend/bundle/main.js:11238:14445)
Node.js v20.18.1