gRPC - Connection Resets (ECONNRESET)

Since about a week ago our app users have started to report sporadic cases of "Error: 14 UNAVAILABLE: read ECONNRESET" on queries going to an array of our gRPC services.

These queries never hit our infrastructure / services, and the user is often able to retry with success.

We have a feeling something is going wrong in the proxy layer in Retool between the client and Retool itself.

  1. Is anyone else experiencing this for gRPC resources?
  2. Can anyone from the Retool team have a look if there is any elevated error rates overall for gRCP connections?

We've been seeing this as well, along with "socket hang up" issues. Early indications suggest it has something to do with Python blocks, but no solid explanation as of yet. Recommendation has been to migrate blocks to JS, though in our case that isn't fully possible.

See Socket hang up issues as well.

1 Like

Ours is in our apps using gRPC resources to connect to our internal services.

We've only had the issues for about a week though.

Anyone from the Retool team having a look?

Hey @Martin_Christiansen! Thanks for reaching out. It definitely looks like there's been a significant spike in these errors since the middle of June. I'll talk to the owning team and see if we can identify the root cause. :thinking:

As soon as there is news to share, I'll follow up here with those updates.

Hi Darren!

Thanks for getting someone on this :slight_smile:

Any news to share so far?

The timing seems to indicate that one of our recent updates to the gRPC connection pooler is likely responsible, but it's not a widespread issue. I'm guessing that the volume of requests you're making is saturating the resource connector. :thinking:

I've temporarily excluded your org from this new rule while we take a closer look and maybe tweak the pooler's settings. On your end, it's probably worth evaluating whether the volume of requests you're making can be reduced! Let me know if you notice a reduction in these errors going forward. :+1:

@Darren thanks for looking into it!

This seem to indeed have resolved the issue, I did not get any connecition resets when manually testing for a minute.

We don't send any redundant requests I'm afraid, we just have all our services in Microservices connected with gRPC - I could imagine we are just not that many having the amount of applications / volume to provoke the issue.

again thanks for looking into this and so far resolving it :-)!