We have several apps/dashboards that have been up and running for a fairly long time (some over a year, others a few months) and we have noticed a significant degradation in query performance since last Friday (Aug 19th 22).
We are now seeing SQL that previously ran in a second or two hit the automatic timeout limit of 120 seconds - the database structure and volume of data has not changed significantly, the SQL is still valid, and the queries run in milliseconds via another SQL client - yet somehow these queries are not completing when run by ReTool.
We've added caching to the query definitions in ReTool, and also delays from page load, but these are not actually addressing the issue so much as trying to mitigate it.
It feels like something has changed internally in ReTool that is now causing serious performance issues for us, to the point where we are unsure how to proceed.
Is there anything that has changed recently internally that could be the cause of this performance drop? Could it be network latency? For information our data is hosted in the EU, presumably ReTool servers are in the US?
We're really hoping there is something that can be done to resolve this issue to save us having to move platforms for our internal BI reporting.
Hey @james.heywood, thanks for flagging this!
It's been brought to the attention of our dev team and they'll be looking into seeing if they can find anything that might be causing this on our end.
In the meantime, for some additional context:
- Are those queries consistently hitting the 120s timeout?
- Are you seeing it on specific queries to your SQL DB or does it seem to be happening for any query on that resource?
- Are you able to successfully query other resources?
Thanks for your response, to answer your questions and provide some more context;
- It's not always the same queries, it varies but when not returning from cache about 1/4 to 1/3 of the queries for the given app timeout at 120 seconds.
- As indicated above, we have added caching to all queries, using the default 300 seconds.
- We have also added delays to all queries below the fold so they load in stages on page load, starting at 500 ms ranging up to 2000 ms
- This helps a little from an end user experience, but doesn't actually address the issue and we still see some tables/charts not returning data
- When one or more tables/charts fails to return data we have instructed users to reload just that bit of data using the reload icon in the app component, this seems to work
- It feels like it might be too many requests to the ReTool backend from our apps that is causing a bottleneck which then causes the timeouts, but as stated we haven't changed the number of tables/charts or queries each app has, so if something has changed it feels like it may be internal to ReTool?
Thanks for that context!
Our devs have pushed some changes as a result of looking into potential issues, are you seeing any difference in the behavior of your queries today?
Brilliant! Yes our apps are much quicker now with no timeouts, thank you so much for helping out!