Jobs runner doesn't run

Hi, I've deployed Retool with Helm and now I'm testing the scalability by configuring it to deploy 2 replicas.
There's a new retool instance called retool-jobs-runner running in parallel but it's always crashing.
Here the logs:
wait-for-it.sh: waiting for :5432 without a timeout

wait-for-it.sh: :5432 is available after 0 seconds

not untarring the bundle

Warning: POSTGRES_SSL_REJECT_UNAUTHORIZED is currently set to 'false'. This will default to 'true' in a future version of Retool, which may break connections to databases with self-signed SSL/TLS certificates. To prepare for this change, either explicitly set POSTGRES_SSL_REJECT_UNAUTHORIZED=false or configure a custom certificate chain by setting POSTGRES_CUSTOM_SSL_CERT_PATH & POSTGRES_CUSTOM_SSL_CA_FILE_NAME (and optionally POSTGRES_CUSTOM_SSL_CERT_FILE_NAME & POSTGRES_CUSTOM_SSL_KEY_FILE_NAME) — see Environment Variables.

Database migrations are up to date.

[pid: 19] [JOBS_RUNNER] Attempting to start jobs runner.

[pid: 19] [JOBS_RUNNER] Jobs runner started.

[pid: 19] [GIT_SYNC] Git syncing and Source Control are not available on SSOP plans, so exiting job

(node:19) [DEP0148] DeprecationWarning: Use of deprecated folder mapping "./" in the "exports" field module resolution of the package at /snapshot/retool_development/node_modules/@tryretool/common/package.json.

Update this package.json to use a subpath pattern like "./*".

(Use retool_backend --trace-deprecation ... to show where the warning was created)

Any ideas ?

Laurent.

Hey @Icaille! Happy to help with this. What version of Retool are you currently running? If it’s pinned to ‘latest’, could you try using an actual version number (e.g. 2.100.6)?

Hi, thank you for your reply, the version deployed was 2.100.2.

Now the version deployed is 2.100.6 but I still have the same issue.

Cool, thank you for confirming. In your file where you set the version variable, does it say …VERSION: 2.100.6… explicitly? Did you manually upgrade to 2.100.6 by setting that variable?

I've set the version in the docker image tag parameter in the helm chart value.


Great, thank you for confirming. If you restart your containers and check your logs, do you still see the same error messages on refresh? And are you able to verify which containers are up and running?

Hi Victoria, restarting the containers show the same error again. And yes, I'm able to verify which containers are up and running and the one that doesn't work is retool-jobs-runner.

This container has only 'JOBS_RUNNER' in the SERVICE_TYPE environment variable.

You can find the deployment yaml in your retool helm chart located there retool-helm/deployment_jobs.yaml at main · tryretool/retool-helm · GitHub

I put the logs here again

wait-for-it.sh: waiting for :5432 without a timeout

wait-for-it.sh: :5432 is available after 0 seconds

not untarring the bundle

Warning: POSTGRES_SSL_REJECT_UNAUTHORIZED is currently set to 'false'. This will default to 'true' in a future version of Retool, which may break connections to databases with self-signed SSL/TLS certificates. To prepare for this change, either explicitly set POSTGRES_SSL_REJECT_UNAUTHORIZED=false or configure a custom certificate chain by setting POSTGRES_CUSTOM_SSL_CERT_PATH & POSTGRES_CUSTOM_SSL_CA_FILE_NAME (and optionally POSTGRES_CUSTOM_SSL_CERT_FILE_NAME & POSTGRES_CUSTOM_SSL_KEY_FILE_NAME) — see Environment Variables.

Database migrations are up to date.

[pid: 19] [JOBS_RUNNER] Attempting to start jobs runner.

[pid: 19] [JOBS_RUNNER] Jobs runner started.

[pid: 19] [GIT_SYNC] Git syncing and Source Control are not available on SSOP plans, so exiting job

(node:19) [DEP0148] DeprecationWarning: Use of deprecated folder mapping "./" in the "exports" field module resolution of the package at /snapshot/retool_development/node_modules/@tryretool/common/package.json.

Update this package.json to use a subpath pattern like "./*".

(Use retool_backend --trace-deprecation ... to show where the warning was created)


Thank you for sending that! Just to confirm, is your Retool instance otherwise running as expected?

I think the jobs-runner container is mainly just used for source control, which doesn't seem to be available on your plan, so you may not actually need it. I think.

Hi, thank you for your reply.
I confirm that

  • when only one instance, no issues
  • when running 2 instances, a third instance is created called retool-jobs-runner and the latter doesn't work.

I tried to find a way to disable the source control but no success so far, if you have an idea on how to do it ?

I tried

DISABLE_GIT_SYNCING: "true"
DISABLE_PROTECTED_APPS_SYNCING: "true"
 VERSION_CONTROL_LOCKED: "false"

Here a snippet of my values.yaml file.

retool:
  replicaCount: 2
  image:
    # https://docs.retool.com/docs/self-hosted-release-notes
    tag: 2.100.6
  env:
    DISABLE_MEMORY_AND_CPU_USAGE_LOGGING: "true"
    LOG_LEVEL: "error"
    DISABLE_INTERCOM: "true"
    LOG_AUDIT_EVENTS: "false"
    DBCONNECTOR_QUERY_TIMEOUT_MS: "600000"
    DISABLE_GIT_SYNCING: "true"
    DISABLE_PROTECTED_APPS_SYNCING: "true"
    HIDE_PROD_AND_STAGING_TOGGLES: "true"
    VERSION_CONTROL_LOCKED: "false"
    POSTGRES_SSL_REJECT_UNAUTHORIZED: "false"

@victoria Hi Victoria, any news by chance ?

Hey @Icaille! So sorry for the delay here.

The jobs runner container only starts running with the second instance? If you look at all running containers for the first instance, what containers do you see?

Also, are there any other errors in either the api container or the jobs-runner container in the second instance?

Also also, any browser console logs?

When specifying more than 1 replica, an extra one called job-runner is created.
The deployment configuration can be found here retool-helm/deployment_jobs.yaml at e8f15d7fe96894688ab9eddddeafa110fcafd8b6 · tryretool/retool-helm · GitHub

This container is failing while the others have no issues.
On the main containers, the service type list will be different as we can see in this configuration too => retool-helm/deployment_backend.yaml at e8f15d7fe96894688ab9eddddeafa110fcafd8b6 · tryretool/retool-helm · GitHub

So to summarize, no errs on api containers, only on jobs-runner container.
And no browser logs as the api containers work.

Oh! Okay, so to clarify, your instances are both usable, you just see a failing container in your second instance? Is that causing any other problems?