Code executor fails DNS queries when executed in privileged mode

Hi there :slight_smile:
I noticed a weird thing when setting up a brand new retool v3.85 instance. I used to have an old retool with a couple of workflows and instead of upgrading this instance I decided to install a new one on a new Kubernetes cluster (1.30).
I also use a self hosted instance of Temporal.

At first, my tasks making API requests to a web service always returned with "internal error". The retool-api logs were showing this error :
{"error":{"message":"FetchError: request to http://retool-workflow-backend/api/workflow/runQueryForSandboxed failed, reason: getaddrinfo EAI_AGAIN retool-workflow-backend","stacktrace":"FetchError: request to http://retool-workflow-backend/api/workflow/runQueryForSandboxed failed, reason: getaddrinfo EAI_AGAIN retool-workflow-backend\n at ClientRequest.<anonymous> (/retool/node_modules/.pnpm/node-fetch@2.7.0_encoding@0.1.13/node_modules/node-fetch/lib/index.js:1501:11)\n at ClientRequest.emit (node:events:517:28)\n at Socket.socketErrorListener (node:_http_client:501:9)\n at Socket.emit (node:events:517:28)\n at emitErrorNT (node:internal/streams/destroy:151:8)\n at emitErrorCloseNT (node:internal/streams/destroy:116:3)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)","type":"FetchError"},"jobId":"6ab5b8b6-fc7f-40d5-b744-670469e35453","level":"error","message":"Error executing block:","orgId":1,"pid":50,"requestId":"50467154-b8c7-41b3-a0c1-9af87fcc8f55","sequelizeCount":6,"timestamp":"2024-09-06T15:20:49.135Z"}

It took me a while to understand that it was actually the retool-code-executor instance that was producing the error, and the api was only displaying it (the logs on retool-code-executor were showing nothing wrong.

So, the executor could not access the workflow backend because it could not get its address from the node local dns services. Which was weird because any other pod could find the address behind retool-workflow-backend...
I saw that the pod was running in privileged mode, and decided to try something.
Running the executor in unprivileged mode fixes the issue, and executor can now get the backend address and query it correctly.

I just wanted to tell you about it. In the end I don't need privileged mode on the executor. But if you ever heard of or find a workaround I'd be happy to know about it :slight_smile:

Thanks :wink:

1 Like

Hi @Emeric_LEBON - welcome to the community! :wave: And thanks for letting us know about this. Did you deploy with Helm or with raw manifests? And did you use any of the Retool templates?

It probably goes without saying but this definitely shouldn't be the case! There are several scenarios that require code-executor to run as a privileged container so I'm definitely interested in tracking down what might be the issue here.

Hi Darren, thanks for that welcome, and I'm happy to help :slight_smile:

I deployed with the helm manifests. The chart version is 6.2.8 and I deployed retool v3.85.0.
The executor service image registry and tag are tryretool/code-executor-service:3.85.0-edge.
I haven't used any retool template yet. All I have is a simple workflow making API queries to a Vmware vSphere instance.

I have a node local dns setup, deployed via Helm, using the image registry.k8s.io/dns/k8s-dns-node-cache:1.23.1 .
The logs seem to indicate that the DNS queries don't even reach the dns service when code-executor runs with a privileged mode.

1 Like