OpenAPI Intermittent Operation Not Found - Possible Caching Issue

Tyler_Stiene · March 31, 2025, 5:34pm

OpenAPI Specification Caching Issue in Published Retool Apps

Goal

We're trying to reliably access new OpenAPI operations in published Retool apps. The expected behavior is that when we add a new operation to a query and publish our app, the published version should consistently recognize and execute that operation. Currently, we're experiencing a critical issue where published apps intermittently fail to recognize operations that were properly configured before publishing, returning "operation not found" errors for several hours after publishing.

Steps

We've observed this issue in our production environment when deploying new versions of our Golang Huma service running on GCP.
To systematically investigate and reproduce the issue, we've created a test service that dynamically adds and removes endpoints on a fixed schedule:
- Test service hosted at: https://dynamic-retool.tylerstiene.ca
- OpenAPI spec available at: https://dynamic-retool.tylerstiene.ca/openapi.json
- Source code: [GitHub Repository Link, if public]
The test service adds one new endpoint every hour and maintains a rolling 7-day window (168 hours), removing older endpoints.
Reproduction steps:
- Add https://dynamic-retool.tylerstiene.ca/openapi.json as an OpenAPI resource in Retool
- Create a query component that uses one of the operations
- Wait for a new endpoint to be added (happens hourly)
- Update the query to use the new operation (may require refreshing the resource first) (maybe seperate bug -> refreshing the schema on the query does not always add the new operation even if the schema shows it - addiitonal screenshot attached).
- Test the query in the editor to confirm it works
- Publish the app
- Access the published version of the app
- Try to execute the query with the new operation
- Observe inconsistent behavior: sometimes it works, sometimes it fails with "operation not found"
- This inconsistent behavior can persist for several hours after publishing
- By the next day, the issue typically resolves itself
We've captured video evidence of this issue occurring in real-time with a published app.

Details

Our investigation focuses specifically on the published app experience with dynamically changing OpenAPI specs:

Network monitoring and production logs shows some critical differences in behavior:

Retool editor: Fetches the OpenAPI spec often when editing.
Published app: Does not refetch the spec on operation calls leading us to believe it is using a cached version.

Log analysis and testing reveal a troubling pattern:

Operations that work perfectly in the editor may fail in the published app
The published app can alternate between successful calls and "operation not found" errors
This behavior can persist for hours after publishing

Key characteristics of the issue:
- Operations configured and verified in the editor fail in published apps
- Inconsistent behavior: sometimes works, sometimes fails with the exact same configuration
- The issue is time-dependent, typically resolving itself by the next day
- This causes production outages when we deploy new API operations and update our Retool apps
Our demo service (https://dynamic-retool.tylerstiene.ca) purposely:
- Sets proper cache control headers (Cache-Control: no-cache, no-store, must-revalidate)
- Logs all spec requests for monitoring purposes
- Generates predictable, time-based endpoints to help reproduce the issue

Screenshots

I am unable to attach files being a new user. screen captures are here https://drive.google.com/drive/folders/1kSI4c6GmZ94eAfDrqcPphQ3aFIM8amBY?usp=drive_link

Demo App showing the query working followed by a 400 errror.
Video showing the query working and then failing in the published app.
Screen shot showing the schema list failing to update in the query editor even though the schema shows the new operation. - Probably a separate bug?

I've recorded a video demonstration that clearly shows this issue occurring in real-time with a published app, which I can share upon request. The video shows how the same operation works in the editor but fails intermittently in the published app after publishing.

App JSON

All you need is a single one button app hooked up to a query that uses the new operation. Here’s a minimal example of the app JSON export.

Impact

This issue is causing significant problems for our development workflow and user experience:

Production Outages: When we deploy new API operations and update our Retool apps to use them, published apps fail unpredictably for hours.
Developer Frustration: The operations work perfectly in the editor, pass all tests, but then fail in production with no clear pattern.
User Confusion: End users receive "operation not found" errors for features that should be working, with no apparent solution other than waiting.
Deployment Delays: We're forced to deploy backend API changes at least 24 hours before updating Retool apps to ensure stability.