RPC Agent: “Error running RPC agent Error: Server error when fetching query: 503” (Same version as older post)

1) My goal:
To run the RetoolRPC agent without encountering recurring errors when fetching queries from Retool.

2) Issue:
I’m receiving this error dozens (sometimes hundreds) of times per hour during normal operation:

Error running RPC agent Error: Server error when fetching query: 503. Retrying...
0|RetoolRPC  |     at RetoolRPC.fetchQueryAndExecute (file:///home/raul/RetoolRPC/node_modules/retoolrpc/dist/rpc-f78d3d6d.mjs:482:19)
0|RetoolRPC  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
0|RetoolRPC  |     at async loopWithBackoff (file:///home/raul/RetoolRPC/node_modules/retoolrpc/dist/rpc-f78d3d6d.mjs:229:28)

The polling timeout error (5000ms) still happens occasionally, but the 503 is far more frequent — it's spamming logs at an unreasonable rate.

3) Steps I've taken to troubleshoot:

  • Verified this is not a fresh install; I'm using the same version mentioned in this related post.
  • No modifications have been made to the retoolrpc package — it’s running with all default settings.
  • Network seems functional: The RetoolRPC agent is still able to successfully trigger scripts in my codebase, despite the recurring errors.
  • Confirmed with IT that all recommended IPs have been temporarily whitelisted:
    35.90.103.132-135
    44.208.168.68-71

4) Additional info:

  • Environment: Cloud Retool
  • RetoolRPC version: Same as used in this post
  • Running Method: same behavior whether I use node main.js directly or run it through pm2.
  • Network: To my knowledge, there's no proxy or firewall intentionally blocking outbound HTTP/S traffic
  • Traceroute results:
    I ran traceroutes to all the relevant IPs; all stall beyond mid-hop (most end with only * responses)
  • Retool connectivity: The script is still able to communicate with Retool normally outside of the RPC error, which makes me question if it's truly a connectivity issue.

since it still works, then I think you're right here

looks like an error from loopWithBackoff which could make sense, if every iteration it tries to execute some RPC function and fails, which gives you the error but instead of bubbling up the error it's ignored, then it waits (the with backoff part) and tries again. if that's the case it may not actually fail until loopWithBackoff hits its retry limit which seems to rarely happen as it keeps trying and eventually goes through..... I'd be curious to know if you increase the backoff amount (double or tripple it just so you can visually see a diff in freq even if it's small) if the errors are less frequent.

1 Like

You're spot on — I think your theory about loopWithBackoff is exactly what's happening here. I bumped CONNECTION_ERROR_INITIAL_TIMEOUT_MS from 50 to 150 just to test the waters, and I’m definitely seeing a noticeable reduction in how often the error shows up. It’s still happening, but not nearly as frequently, which lines up with the idea that it’s retrying and eventually succeeding behind the scenes..

looks like you can ignore the error then, but lets see if we can think of a way to get the RetoolRPC library to ignore 503 errors. I'ma have to look into it a bit, but I think ideally we would get a reference to the underlying web request library and add an interceptor to ignore 503s.