All Retool db queries timing out

I have an app where all of my retool queries (Postgres) are timing out right now. This happens occasionally. It usually lasts around 5 minutes. This time I tried accessing the db directly via pgAdmin and ran a query with out issue.

Is there somewhere I can see if an app or a service is temporarily down?

When developing it's not a big issue but a client mentioned the same issue was happening for about 20 minutes.

Hello @Shawn_Optipath!

I apologize for this issue, there are a wide number of possible culprits for what could be causing such outages :sweat:

You can check on our official website here on the status of our cloud servers. This will be our first line of alerting customers of any unexpected outages that occur on our side.

If your or clients are self hosted, unfortunately there is not much we can do. 20 minutes is very concerning if the incident is not also being reported on our website. If you have any further details on the account that experienced this issue and the date/time we can check and see if this was a reported incident or if we need to check their cloud server logs.

We are always striving to get as close to 100% uptime as possible, we appreciate your patients as our software is unfortunately not yet perfect and incidents can pop up occasionally :sweat_smile:

Hi @Jack_T ,
Thank you for getting back to me.
This client is on the Retool Business cloud. This is happening again right now. It's about the 3rd time since my last post. It has been about 10 minutes and is still down.

The client is a snow removal company and Retool will become a critical tool within the next 2 weeks. Can you or someone in support please DM me and I will give the client details to help troubleshoot.

1 Like

@Shawn_Optipath,

No problem, just sent you a DM.

The best option for troubleshooting would be live with our team during office hours tomorrow or Thursday.

Hopefully we can get this sorted as quickly as possible!

Interesting. I just set up a new resource and connected to the Retool DB in the Retool UI, and I got much better results (215ms vs 38.97s!).

Can someone explain this and possibly help troubleshoot? I assume this is actually on the Retool side of things.

Retool Database as a Resource:
chrome_idt37BtxWt

PostgreSQL as a Resource:
chrome_fLUpUF8QLW

If this is the way to go then so be it. I will have to update about 100 queries so I would love help confirming before I begin if possible.

Hello @Shawn_Optipath,

That is good to see that we at least have an alternative option to move databases to Retool DB to get back to a reasonable performance time.

But I was to see if we can troubleshoot/figure out why that PostgreSQL resource is so slow.

Since they are on cloud I was able to pull up their org. Let me see if I can find any details in their logs.

If you can give me more details on where/how the PostgreSQL DB is getting hosted that will definitely help. As you mentioned before you are able to query/make requests to the DB with the same query outside of Retool, so that does narrow down what could be causing this issue.

Hi @Jack_T ,
I queried the same db with 2 different approaches. There is only 1 db for this client. It is the Retool db that comes with the account.

Even the window in the screenshot below was either non responsive or extremely slow.
chrome_1rhfoelBQe

Does that make sense?

Ahhh ok interesting, so the table lives in the Retool DB(which is a PostgreSQL wrapped in the Retool DB resource UI and table) and Retool DB resource is super slow but PostgreSQL resource using a connection string was much faster, correct?

Could you share a screenshot where you mouse over the query's time and it shows a breakdown of how long various steps of the query took?

I circled the spot to mouse over in the screenshot below

Also if you click on the FX button next to Resource in the query panel, if you could open that and DM me the Resource ID I should be able to check the logs in Datadog :saluting_face:

Ah one more thing!

Most important, if you can run the slow query, and then open your browser's network tab and share the x-request-id with me, I can pinpoint the query run in datadog to help narrow things down and get a direct look at the request info to see if I can figure out what is going on.

Yes, correct.

The issues went away after about an hour last night so I can no longer send you a screenshot of the time breakdown of the big 38.97s query result.

Retool DB resource UI:

Here are screenshots of the Resource IDs
b50d1c26-b2e9-4d18-b61f-781779149457

PostgreSQL resource using a connection string:

Here are screenshots of the Resource IDs
7e377721-7e9e-458e-97f5-b4e3c1de5881

Here is the x-request-id.
chrome_UdRTI2htP8

I didn't time when the query was running slowly yesterday but given the timestamps on this post I'm guessing around 4.30 pm PST.

1 Like

Ok perfect thank you, let me check this out.

Also I believe you are top of the queue in office hours now for us to look at the app and troubleshoot :+1:

Ok, thanks. I can jump in shortly.

Thanks again, @Jack_T. Do let me know if you find anything.

I'll keep monitoring this end. I will also see if I can break down a more labor-intensive workflow or two.

1 Like

It was great to meet you in office hours @Shawn_Optipath!

Will report back to this thread with more details when I hear back from the engineer that is investigating this issue.

Just for any forum users that find this thread, if issues do pop up with Retool DB and you are cloud hosted, migrating to an external database will be the best option to have ultimate control over availability, uptime, log monitoring and direct access to the data tables needed :sweat_smile:

1 Like

Great meeting you too @Jack_T!

It's happening again right now. I cannot access any of the resources via the Retool UI. See below. Not sure what to do. This is becoming too frequent. Also, I know for a fact I am the only one working in the db right now and an extremely low volume of query data going through.

Let me know if it is better to start a new thread or DM you.

Here's what the app looks like after a refresh right now :sweat_smile:

Ahhhhhh I am so sorry, that is not ideal.

I checked out uptime page and it doesn't''t appear that we have having an outage or incident on our end :sweat:

Are you able to open up the browser to view the network tab to see if there are any details on the request to Retool DB that is being made by the web page?

We are investigating what could be preventing Retool DB from being accessible on your end, we don't have any other users reporting outages which is odd.

Will let you know if we can figure out what could be causing this. Hopefully it returns to normal and you will be able to migrate to Neon/PostgreSQL to avoid this happening in the future :crossed_fingers:

Let me know if this is still ongoing or if there is a time range we can look at for your Retool instance to try to triage!

Next time it happens I can look into the network tab. It's running fine right now.

Looking back at my original post and the one yesterday it seems to happen just after 4pm PST unless that is coincidence. :flushed:

I'll be around here today that time to see if anything starts slowing down.

This morning I had a Zoom with Neon to discuss migration or help diagnose any bottlenecks. They did mention that Retool throttles each db account to 0.25 vCPU and 1 gb. This could be at least part of the issue. Is there an option to increase this, even occasionally?

For the moment the client has decided not to migrate directly to Neon.

We have however decided to swap over to PostgreSQL resource connections to the Retool db as this didn't have a performance hit when the slowdown was happening with the Retool DB resource.