Agent "Total runtime" and "Estimated cost" keep increasing when the agent containers are down

  1. My goal:
    Test Retool Agents self-hosted and fully understand the billing model.

  2. Issue:

    1. We had an issue with the underlying containers for Retool Agents (agent-worker and agent-eval-worker)
    2. So effectively, no agents could run
    3. Any agents that were triggered during the container downtime were stuck and displayed as “Thinking”, manual termination did not immediately terminate these agents
    4. The reported numbers for Agent “Total runtime” and “Estimated cost” kept increasing until the underlying containers came back.
  3. Retool version & hosting setup (Docker, K8s, cloud provider, etc.):
    Retool self-hosted 3.253.4 (2025-Q3 stable) on GKE via Retool Helm chart

  4. Error message(s) or screenshots:
    Token usage 0, Tool calls 0 → so the agent did not do anything
    Estimated cost $4.34 / Total runtime 52m 3s

There’s two improvements that I’d like to request here:

When the underlying infrastructure for agents is not ready

  1. New agent runs should not start and an error message should be displayed to the user.
  2. For existing agent runs, Runtime and Estimated Costs should not be increasing