Installing Retool using Helm (Liveness probe failed and Readiness probe failed)

Shubham_Lohar · April 18, 2023, 11:31am

My Pod starts running and after sometime they fail due to, Liveness probe failed and Readiness probe failed
Log :
Normal Created 6m26s (x2 over 8m26s) kubelet Created container retool
Normal Started 6m26s (x2 over 8m26s) kubelet Started container retool
Warning Unhealthy 4m57s (x6 over 7m17s) kubelet Liveness probe failed: Get "http://...:**/api/checkHealth": dial tcp ....: connect: connection refused
Warning Unhealthy 3m23s (x16 over 7m17s) kubelet Readiness probe failed: Get "http://...**:/api/checkHealth": dial tcp 10.196.1.53:**: connect: connection refused

victoria · April 19, 2023, 6:29pm

Hey @shubham_Lohar! Happy to help, I just have a few questions for you

Is this your first time deploying or was this working before?
How did you deploy/did you follow any docs in particular?
And what version are you running?

Thank you!

eduard.robbie · November 29, 2023, 3:54pm

Hi! @Shubham_Lohar did you find a solution to this?
I am facing exactly the same problem:

Normal Started 17m (x3 over 32m) kubelet Started container retool
Warning Unhealthy 12m (x22 over 20m) kubelet Readiness probe failed: Get "http://10.10.10.53:3000/api/checkHealth": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Pod works for a while, until liveness fails and is restarted. However when working there are some calls to /api/checkHealth that are resolved OK:

{"level":"info","message":{"http":{"method":"GET","status_code":200,"url_base":"https://10.10.10.53:3000","url_path":"/api/checkHealth"},"type":"REQUEST_FINISH"},"pid":50,"requestId":"513d7d16-23bf-4388-9de3-9fd743ab65b6","timestamp":"2023-11-29T15:44:47.351Z"}

Container does not logs any crash or error.

victoria · November 29, 2023, 9:30pm

Hi Eduard! Happy to help here.

I have the same questions for you

Is this your first time deploying or was this working before?
How did you deploy/did you follow any docs in particular?
And what version are you running?

eduard.robbie · November 30, 2023, 8:26am

Hi Victoria!

It's first deployment.
Followed official docs on how install in Kuberentes using Helm
Installed Helm chart version retool-6.0.4 which installed image tryretool/backend:3.12.5

I want to note that:

Retool is working OK (no errors connecting to database, or anything) UNTIL lievenessprobe fails and it's restarted, which happens so frequently.
When livenessprobe fails the error shown by Kubernetes is always "Client.Timeout exceeded while awaiting headers"

Thanks

victoria · November 30, 2023, 5:52pm

Are you able to grab more of the container logs?

And could you double check that there is only 1 jobs-runner service running? It should be separate from the other services (db_connector, main_backend)!

SERVICE_TYPE=JOBS_RUNNER

and then run the rest of it with

SERVICE_TYPE="MAIN_BACKEND, DB_CONNECTOR, DB_SSH_CONNECTOR, WORKFLOW_WORKER"

eduard.robbie · December 1, 2023, 8:01am

Hi Victoria!

I have only one pod running:

NAME                               READY   STATUS    RESTARTS      AGE
retool-676854c859-wftct            1/1     Running   4 (39h ago)   40h

Looking for logs I see the following:

wait-for-it.sh: retool-sqlproxy:5432 is available after 0 seconds
not untarring the bundle
{"level":"info","message":"[process service types] MAIN_BACKEND, DB_CONNECTOR, DB_SSH_CONNECTOR, JOBS_RUNNER","timestamp":"2023-11-29T15:40:49.521Z"}
Database migrations are up to date.
....
{"level":"info","message":"Jobs runner started.","pid":19,"source":"JOBS_RUNNER","timestamp":"2023-11-29T15:40:57.594Z"}

Not sure if this answers your question.

For container logs: do I need to search for something?

Additional Info

retool pod is scheduled in a node with:

Allocatable:
  cpu:                3860m
  ephemeral-storage:  119703055367
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             12880728Ki
  pods:               110

Currently usage of node is:

NAME                              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
node-00001                          207m         5%     4205Mi          33%

And currently allocated resources of node (by resources requests of all pods in the node) are:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                310m (8%)   500m (12%)
  memory             782Mi (6%)  2780Mi (22%)

Greetings!