OpenAPI Specification Caching Issue in Published Retool Apps
Goal
We're trying to reliably access new OpenAPI operations in published Retool apps. The expected behavior is that when we add a new operation to a query and publish our app, the published version should consistently recognize and execute that operation. Currently, we're experiencing a critical issue where published apps intermittently fail to recognize operations that were properly configured before publishing, returning "operation not found" errors for several hours after publishing.
Steps
- We've observed this issue in our production environment when deploying new versions of our Golang Huma service running on GCP.
- To systematically investigate and reproduce the issue, we've created a test service that dynamically adds and removes endpoints on a fixed schedule:
- Test service hosted at: https://dynamic-retool.tylerstiene.ca
- OpenAPI spec available at: https://dynamic-retool.tylerstiene.ca/openapi.json
- Source code: [GitHub Repository Link, if public]
- The test service adds one new endpoint every hour and maintains a rolling 7-day window (168 hours), removing older endpoints.
- Reproduction steps:
- Add https://dynamic-retool.tylerstiene.ca/openapi.json as an OpenAPI resource in Retool
- Create a query component that uses one of the operations
- Wait for a new endpoint to be added (happens hourly)
- Update the query to use the new operation (may require refreshing the resource first) (maybe seperate bug -> refreshing the schema on the query does not always add the new operation even if the schema shows it - addiitonal screenshot attached).
- Test the query in the editor to confirm it works
- Publish the app
- Access the published version of the app
- Try to execute the query with the new operation
- Observe inconsistent behavior: sometimes it works, sometimes it fails with "operation not found"
- This inconsistent behavior can persist for several hours after publishing
- By the next day, the issue typically resolves itself
- We've captured video evidence of this issue occurring in real-time with a published app.
Details
Our investigation focuses specifically on the published app experience with dynamically changing OpenAPI specs:
- Network monitoring and production logs shows some critical differences in behavior:
- Retool editor: Fetches the OpenAPI spec often when editing.
- Published app: Does not refetch the spec on operation calls leading us to believe it is using a cached version.
- Log analysis and testing reveal a troubling pattern:
- Operations that work perfectly in the editor may fail in the published app
- The published app can alternate between successful calls and "operation not found" errors
- This behavior can persist for hours after publishing
-
Key characteristics of the issue:
- Operations configured and verified in the editor fail in published apps
- Inconsistent behavior: sometimes works, sometimes fails with the exact same configuration
- The issue is time-dependent, typically resolving itself by the next day
- This causes production outages when we deploy new API operations and update our Retool apps
-
Our demo service (https://dynamic-retool.tylerstiene.ca) purposely:
- Sets proper cache control headers (
Cache-Control: no-cache, no-store, must-revalidate
) - Logs all spec requests for monitoring purposes
- Generates predictable, time-based endpoints to help reproduce the issue
- Sets proper cache control headers (
Screenshots
I am unable to attach files being a new user. screen captures are here https://drive.google.com/drive/folders/1kSI4c6GmZ94eAfDrqcPphQ3aFIM8amBY?usp=drive_link
- Demo App showing the query working followed by a 400 errror.
- Video showing the query working and then failing in the published app.
- Screen shot showing the schema list failing to update in the query editor even though the schema shows the new operation. - Probably a separate bug?
I've recorded a video demonstration that clearly shows this issue occurring in real-time with a published app, which I can share upon request. The video shows how the same operation works in the editor but fails intermittently in the published app after publishing.
App JSON
All you need is a single one button app hooked up to a query that uses the new operation. Hereβs a minimal example of the app JSON export.
Impact
This issue is causing significant problems for our development workflow and user experience:
-
Production Outages: When we deploy new API operations and update our Retool apps to use them, published apps fail unpredictably for hours.
-
Developer Frustration: The operations work perfectly in the editor, pass all tests, but then fail in production with no clear pattern.
-
User Confusion: End users receive "operation not found" errors for features that should be working, with no apparent solution other than waiting.
-
Deployment Delays: We're forced to deploy backend API changes at least 24 hours before updating Retool apps to ensure stability.