Viewing a workflow results in non-optimized query and 100% CPU usage of our database server and crashes instance

markmarkoh · November 10, 2025, 7:21pm

My goal: View and edit a Workflow
Issue: Page spins forever and the entire Retool instance is non-responsive
Retool version & hosting setup (Docker, K8s, cloud provider, etc.): Docker, 3.284.5
Database is a Postgres server hosted on AWS RDS in the same VPC

After doing an upgrade to 3.284.5, and trying to view a workflow, my entire Retool instance slows down and eventually times out on every request.

Using the RDS tools, I was able to track down a single query that never completes but eats up 100% of my RDS server capacity:

SELECT * FROM workflow_run AS wr1 WHERE "workflowId" IN ('d591ecb0-d9f3-4e2d-8632-13718db9be91', '6316b538-9bd2-4daf-8baf-998d528ec9ac') AND id IN ( SELECT id FROM workflow_run AS wr2 WHERE wr1."workflowId" = wr2."workflowId" ORDER BY "createdAt" DESC LIMIT 10 ) ORDER BY "workflowId", "createdAt" DESC LIMIT 20

Connecting to the DB, I can’t even run EXPLAIN ANALYZE on that, it just sits and takes forever until I cancel it.

workflow_run has 201k rows in it. My retool instance is set to have a 30 day retention window of logs, but I’m not sure if that is related to this particular table.

Slightly adjusting the query by limiting the results in the first SELECT and running it in PSQL shows me a big performance issue:

EXPLAIN ANALYZE with wr1 as ( SELECT * FROM workflow_run AS wr1 WHERE "workflowId" IN ('d591ecb0-d9f3-4e2d-8632-13718db9be91', '6316b538-9bd2-4daf-8baf-998d528ec9ac') LIMIT 10 ) select * from wr1 WHERE id IN ( SELECT id FROM workflow_run AS wr2 WHERE wr1."workflowId" = wr2."workflowId" ORDER BY "createdAt" DESC LIMIT 10 ) ORDER BY "workflowId", "createdAt" DESC LIMIT 20

That takes about 400ms to even show me the plan. When I change the first “LIMIT 10” to “LIMIT 500”, it takes 8 seconds to show me the plan.

Without the limit I imposed for testing, that first SELECT returns 20k rows.

EXPLAIN ANALYZE tells me the table isn’t indexed properly.

Copying the database and testing it, I was able to repro the issue, and fix it by dropping one index and adding a new index:

Do not do this on prod:

DROP INDEX CONCURRENTLY workflow_run_createdAt_index;
CREATE INDEX CONCURRENTLY idx_workflowid_createdat_desc
ON workflow_run("workflowId", "createdAt" DESC);

After doing this and rerunning my EXPLAIN ANALYZE, everything is super snappy and runs in less than 15 milliseconds.

I’m concerned that during my upgrade, a database migration failed to run that would have that index.

I followed these upgrade steps, one by one, making sure everything looked normal before completing the next one.

Jack_T · December 4, 2025, 10:37pm

Hi @markmarkoh,

Great detective work on using EXPLAIN ANALYZE to catch the cause of the slowness being due to the indexing on the table.

Has that solved the issue now or are you still experiencing slowness?

It sounds like exactly what you suspect happened, some part of the migration failed, causing the indexing issue on the table.

The jobs-runner container would contain logs for anything that fails during a migration.

Jack_T · January 23, 2026, 12:05am

Hi @markmarkoh,

Just wanted to check if you were able to successfully migrate on your second attempt or if you have some error messages you could share with me to help you resolve this!

Topic		Replies	Views
Workflow Failing Without Errors or Explanation 💬 Workflows	14	369	September 18, 2025
Runtime error: exited workflow with code 139 💬 Workflows	20	11820	January 16, 2025
RDS instance storage filling up massively 💬 Self Hosted Retool sql , table	11	159	September 22, 2025
All Retool db queries timing out 💬 App Building	23	442	December 23, 2024
Postgres connection error: not enough connections 💬 Workflows resource-connection	1	32	November 13, 2024

Viewing a workflow results in non-optimized query and 100% CPU usage of our database server and crashes instance

Related topics