Hi @Shawn_Optipath,
I was just about to say, our engineers suggested trying to use a PostgreSQL resource connection when the issue pops up, but it sounds like you already tested that and it doesn't take a performance hit, which is great to hear.
Switching over to PostgreSQL resource connections is likely the best solution in the short term given the seasonal time constraints. We can definitely look to change the throttle issue as well for a longer term solution as needed.
Our engineer was able to see from your logs that the connectivity issues occur exactly when your workflows are running
They went on to add that the app is "making crazy amount of connections during this time and workflow queries are also failing in that time". From the image I was it looks like there are 60+ connections occurring at once.
Given that both of these occurring in conjunction, the solution in my mind is to see if these can be alleviated in their own ways.
Figuring out why so many connections are being made and reducing the number of connections would be top priority.
I know you mentioned the workflows are triggered by webhooks from job boards, but if the workflows can be spread out or run periodically that could also open up bandwidth
Maybe some type of scheduler third party (Temporal?) could be used to work as a middleman between the incoming data and the workflows to act as a sort of a load balancer food for thought.