Workflow Worker Failing: "Failed to fetch from pypi" for refreshLibraryRegistry Activity

Hi Retool Community,

We are encountering persistent errors in our Retool instance's workflow-worker logs (we believe we are running an on-premise setup based on file paths like /retool_backend/).

It's important to note that this instance runs in a network-restricted, mostly air-gapped environment. We have carefully configured our network egress rules (firewall/proxy) to allow connections specifically to the domains listed in the official Retool network requirements documentation (https://docs.retool.com/self-hosted/reference/requirements#network-requirements).

Despite having these allowances in place, the refreshLibraryRegistry activity is failing continuously. We observe this for workflows like refresh-library-registry-cron and refresh-library-registry-on-startup. The attempt count for this activity is extremely high (e.g., 2900+, 3100+), suggesting it's stuck in a rapid retry loop.

The consistent error message logged is Error fetching from pypi, stemming from an underlying error Failed to fetch from pypi.

Here are some example log entries:

// Info message right before the error
workflow-worker {"level":"info","message":"Fetching library registry from pypi API","pid":19,"requestId":"1695a1d7-eae3-429f-8c16-ff6004f835f6","timestamp":"2025-03-31T01:15:56.703Z"}

// Error log with stack trace
workflow-worker {"error":{"message":"Failed to fetch from pypi","name":"Error","stack":"Error: Failed to fetch from pypi\n    at yfI (/retool_backend/bundle/main.js:10203:21646)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    // ... (rest of stack trace) ...\n    at async /retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_...@1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/worker.js:648:30"},"level":"error","message":"Error fetching from pypi","pid":19,"requestId":"1695a1d7-eae3-429f-8c16-ff6004f835f6","timestamp":"2025-03-31T01:15:56.713Z"}

// Activity failure warning log
workflow-worker {"activityId":"1","activityType":"refreshLibraryRegistry","attempt":2902,"durationMs":10,"errorMsg":"Error","isLocal":false,"label":"activity","level":"warn","message":"Activity failed (Attempt 2902) - refreshLibraryRegistry", ...}

// Worker reporting activity failure
workflow-worker {"activityId":"1","activityType":"refreshLibraryRegistry","attempt":2902,"durationMs":10,"error":{},"isLocal":false,"label":"worker","level":"warn","message":"Activity failed", ...}

It appears the workflow-worker process (running in /retool_backend/) is unable to connect to or successfully fetch data from the PyPI API (pypi.org), even though we have allowed the domains specified in the Retool documentation.

Is it possible that pypi.org, files.pythonhosted.org, or other related domains are required for this specific activity but are not explicitly listed in the general Retool network requirements?

We would appreciate any guidance on why this connection might still be failing despite allowing the documented Retool domains, or pointers on how to troubleshoot this further.

Thanks in advance!

environments:

  • self-hosted
  • retool 3.148.0-stable
  • kubernetes 1.28

Hello @_Kevin,

Apologies for the issue.

Let me double check with our workflow engineering team to confirm. My hunch is that your guess is correct, and that pypi.org , files.pythonhosted.org , and possibly other related domains are required for this specific activity but are not explicitly listed in the general Retool network requirements.

I can see if the team can confirm this and provide all the domains needed for pypi to work so that they can be added into your setup :+1:

I just found our docs on setting up a private Pypi respository. For if you want to keep that localized and not need to extend networking out to any other domains outside of your network.

I have some other contextual questions to better understand your deployment setup.

Do you have a code-executor?

What Cloud Provider are you using?

  • What deployment type are you using? (docker-compose on VM, k8s/helm, ECS on EC2/Fargate, etc. if helm, what version of helm chart?)

  • Could you share the relevant deployment files for the deployment type above? (e.g for helm that would be your values.yml or for docker, that would be docker-compose.yml and docker.env

  • How is Temporal configured? (Retool-managed, self-hosted, etc.)

• Could you share your container logs from your workflows containers? (that'll be workflow-worker, workflow-backend , code-executor (if enabled) and temporal (if using a local cluster))

These will help us get a better understanding of the setup and how to best get your deployment to access the pypi library and get around this error.

Hey @_Kevin - I did some additional digging here and can confirm that the https://pypi.org/ domain should be added to the egress rules that you've already configured. This seems to be an oversight in our documentation, which I'll flag for the owning team.

Let us know if you have any additional questions!