My Pod starts running and after sometime they fail due to, Liveness probe failed and Readiness probe failed
Log :
Normal Created 6m26s (x2 over 8m26s) kubelet Created container retool
Normal Started 6m26s (x2 over 8m26s) kubelet Started container retool
Warning Unhealthy 4m57s (x6 over 7m17s) kubelet Liveness probe failed: Get "http://...:**/api/checkHealth": dial tcp ....: connect: connection refused
Warning Unhealthy 3m23s (x16 over 7m17s) kubelet Readiness probe failed: Get "http://...**:/api/checkHealth": dial tcp 10.196.1.53:**: connect: connection refused
Hey @shubham_Lohar! Happy to help, I just have a few questions for you
- Is this your first time deploying or was this working before?
- How did you deploy/did you follow any docs in particular?
- And what version are you running?
Thank you!
Hi! @Shubham_Lohar did you find a solution to this?
I am facing exactly the same problem:
Normal Started 17m (x3 over 32m) kubelet Started container retool
Warning Unhealthy 12m (x22 over 20m) kubelet Readiness probe failed: Get "http://10.10.10.53:3000/api/checkHealth": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Pod works for a while, until liveness fails and is restarted. However when working there are some calls to /api/checkHealth that are resolved OK:
{"level":"info","message":{"http":{"method":"GET","status_code":200,"url_base":"https://10.10.10.53:3000","url_path":"/api/checkHealth"},"type":"REQUEST_FINISH"},"pid":50,"requestId":"513d7d16-23bf-4388-9de3-9fd743ab65b6","timestamp":"2023-11-29T15:44:47.351Z"}
Container does not logs any crash or error.
Hi Eduard! Happy to help here.
I have the same questions for you
- Is this your first time deploying or was this working before?
- How did you deploy/did you follow any docs in particular?
- And what version are you running?
Hi Victoria!
- It's first deployment.
- Followed official docs on how install in Kuberentes using Helm
- Installed Helm chart version retool-6.0.4 which installed image tryretool/backend:3.12.5
I want to note that:
Retool is working OK (no errors connecting to database, or anything) UNTIL lievenessprobe fails and it's restarted, which happens so frequently.
When livenessprobe fails the error shown by Kubernetes is always "Client.Timeout exceeded while awaiting headers"
Thanks
Are you able to grab more of the container logs?
And could you double check that there is only 1 jobs-runner service running? It should be separate from the other services (db_connector, main_backend)!
SERVICE_TYPE=JOBS_RUNNER
and then run the rest of it with
SERVICE_TYPE="MAIN_BACKEND, DB_CONNECTOR, DB_SSH_CONNECTOR, WORKFLOW_WORKER"
Hi Victoria!
I have only one pod running:
NAME READY STATUS RESTARTS AGE
retool-676854c859-wftct 1/1 Running 4 (39h ago) 40h
Looking for logs I see the following:
wait-for-it.sh: retool-sqlproxy:5432 is available after 0 seconds
not untarring the bundle
{"level":"info","message":"[process service types] MAIN_BACKEND, DB_CONNECTOR, DB_SSH_CONNECTOR, JOBS_RUNNER","timestamp":"2023-11-29T15:40:49.521Z"}
Database migrations are up to date.
....
{"level":"info","message":"Jobs runner started.","pid":19,"source":"JOBS_RUNNER","timestamp":"2023-11-29T15:40:57.594Z"}
Not sure if this answers your question.
For container logs: do I need to search for something?
Additional Info
retool pod is scheduled in a node with:
Allocatable:
cpu: 3860m
ephemeral-storage: 119703055367
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 12880728Ki
pods: 110
Currently usage of node is:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-00001 207m 5% 4205Mi 33%
And currently allocated resources of node (by resources requests of all pods in the node) are:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 310m (8%) 500m (12%)
memory 782Mi (6%) 2780Mi (22%)
Greetings!