RetoolRPC polling architecture doesn't scale with multiple resources — 503 errors and exponential backoff

vinay · April 5, 2026, 1:50pm

We're using retoolrpc@0.1.7 (Node.js) to connect our backend to multiple Retool apps. Each app has its own RPC resource, and we currently run 13 resources in a single Node.js process deployed on Kubernetes.

Each RPC resource creates its own agent that continuously polls popQuery every 1 second. With 13 resources across 2 pods, we're making 26 HTTP requests per second to Retool's server — just for polling, even with zero user activity. That's over 2 million polling calls per day.

As we add more Retool apps (and therefore more RPC resources), this grows linearly. At our previous count of 25 resources (before consolidation), we were hitting 50 requests/sec or 4.3 million calls/day.

Symptoms we're seeing

503 (Service Unavailable) responses from the popQuery endpoint — Retool's server appears to be rate-limiting or struggling with the polling volume from our agents.
Exponential backoff spiral — When popQuery returns a 503, the library's loopWithBackoff catches the error and enters exponential backoff (50ms → 100ms → ... up to 10 minutes). During backoff, the agent stops polling and any RPC calls from Retool go undelivered.
pollingTimeoutMs (default 5s) compounds the problem — The popQuery call is a long-poll that intentionally holds the connection open. The default 5s timeout is shorter than the typical hold time, causing false timeouts that trigger additional backoff on top of the 503-induced backoff. We resolved this specific issue by increasing pollingTimeoutMs to 30s, but the 503s remain.
Intermittent 120s RPC timeouts — Users see random RPC calls that never reach our server. These correlate with agents being stuck in backoff from the 503/timeout errors above.

What we've done on our side

Increased pollingTimeoutMs from 5s to 30s — eliminated false timeout errors
Removed unused legacy resources — reduced from 25 to 13 agents
Planning to increase pollingIntervalMs from 1s to 3s

These mitigations help, but the fundamental issue remains: every new RPC resource adds another always-on polling loop, and the architecture doesn't scale beyond a certain number of resources.

Questions for the Retool team

Is there a rate limit on the popQuery endpoint? If so, what is it (per API token, per account, per IP)? Knowing the limit would help us plan capacity.
Is there a push-based delivery model planned? (WebSocket, webhooks, server-sent events) — where Retool notifies our server when a query arrives, rather than our server polling every second. This would eliminate the scaling issue entirely.
Can a single agent poll for multiple resource IDs in one request? Currently each resource requires its own popQuery call. If the API supported batching (e.g., "give me the next query for any of these 13 resources"), we could reduce 13 polls/sec to 1.
Is there guidance on the recommended maximum number of RPC resources per server/account? We couldn't find scaling documentation for RPC.

If you have any other suggestions on which resource types suits for our requirement, please let us know

Topic		Replies	Views
Reduce Retool RPC Polling Interval 💬 Queries and Resources welcoming-replies	1	294	September 12, 2024
Performance Issues with Retool RPC 💬 App Building	8	604	October 14, 2024
RPC Agent: “Error running RPC agent Error: Server error when fetching query: 503” (Same version as older post) 💬 Queries and Resources bug , grpc	6	84	August 14, 2025
RPC Agent Polling Timeout (5000ms) — Even with Fresh RetoolRPC Setup 💬 App Building resource-connection , bug , javascript	6	109	June 27, 2025
Periodic crashes of RetoolRPC due to "agent version mismatch" 💬 App Building	4	72	August 15, 2025

RetoolRPC polling architecture doesn't scale with multiple resources — 503 errors and exponential backoff

Symptoms we're seeing

What we've done on our side

Questions for the Retool team

Related topics