Since about mid-October we've started noticing a few issues with our resource connection to our database.
Some background on our setup: private rds database with limited data exposed via a pgbouncer server running in front of the database. Retool cloud connects through the pgbouncer server. Pgbouncer is configured to accept only tls connections and uses self-signed certificates for tls and authentication. This set up has been working without issue for a couple of years.
Since about mid-October, I'm seeing an increased number of warnings ("WARNING TLS handshake error: handshake failed: No error information") in the pgbouncer logs and some errors due to the database user configured for retool trying to open more connections to the database than it is allowed.
I'm also no longer able to test the connection. When I try to test the connection with the 'Test connection' option in the resource ui, I get this error now: "Unable to connect. Error: self signed certificate in certificate chain". Thinking that this might also be related to the tls errors in the pgbouncer log, I tried re-implementing tls for pgbouncer using certificates from a public ca (let's encrypt), but this resulted in a different error for the resource ("No local issuer certificate"). So it seems like retool cloud possibly doesn't have ca-certificates available for lets encrypt.
The app that connects to this resource is still generally able to connect to the database. However, it does sometimes raise errors about retrieving metadata, which seems to be due to the connection errors mentioned above.
I guess I have a couple of questions:
Has there been some change to retool that would cause it to use more database connections? Should I increase the number of connections that the retool user can use (it's allowed 8 now)? How many connections does retool normally use?
What's up with 'Test connection'? It used to work no problem. Are the tls errors that it raises relevant to the usage of the resource, ie will the app that uses the resource encounter similar issues in normal operation? Do they have anything to do with the tls warnings I see in pgbouncer?Do I need to change the certificates that I use in any way to resolve this (and the tls warnings in pgbouncer)?
Super strange. I wouldn't expect more database connections to be required, though there isn't a set number. If one fails, we may create another connection.
If 'Test connection' doesn't work, then the resource in the app is also expected to not work.
Would you mind sharing a screenshot of your resource page (feel free to redact any secrets)?
Sure. I've been off for the holidays, so sorry for the delay.
This is the error displayed for 'Test Connection'. However, under 'Environments', 'production' indicates connected and the app that uses this resource is still able to connect and retrieve data.
No worries at all, @kellen! For context, in July we changed the default setting for rejecting unauthorized SSL connections from false to true. We grandfathered in resources created before this change to still be set to false in order to not unexpectedly break connections, though certain resource connections may not have gotten grandfathered.
The error you're running into generally indicates that you need to supply a certificate. Does it work if you set it up with the connection string but turn off the TLS validation?
If you don't want to skip TLS validation, you will need to supply SSL certificates with your request or you can try changing the verification mode to "Verify CA Certificate" in the resource setting page.
Would any of these options work for you? Let me know!
That's kind of part of the issue. We have it set for full verification using self-signed client and server certificates, which wasn't an issue before. It does "work" if we turn off certificate verification, but that's not an option security-wise. The new functionality seems to reject self-signed certificates outright, but at the same time, doesn't really seem to support working with public CAs either. When this first started popping up, I tried re-implementing tls for our pgbouncer server using certificates created by let's encrypt, which also resulted in errors from 'Test Connection' ("No local issuer certificate"). This makes me think that whatever environment 'Test Connection' runs in doesn't have access to commonly available public ca certificates. I've seen this suggested in other retool support issues as well (eg MySQL+SSL: "Error: unable to get local issuer certificate").
So I guess, ultimately, my question comes down to: is there currently a supported way to use tls for connections to postgres servers using retool cloud or not? And what would that be? Are there restrictions on which public cas are supported? Because, right now, neither option appears to actually work.
I do believe it's possible! I may need to enable a flag for you and then have you try something from there. If this doesn't work, we can always disable the flag! Does this sound alright with you?
Absolutely, just let me know what I need to do.
Cool—I just enabled the flag! Could you see if you now have a new
SSL Host field? You'll need to enter your host in this field as well
Yeah, I have it.
Still getting an error in 'Test Connection', but this one's clearly on my end, I'll need to adjust the subject alt names of the certificates since it looks like the base domain name isn't listed. Is it possible that this has been my actual problem the whole time? If so, that's pretty embarrassing.
I'll update the certificates and get back to you.
Certificates updated, and 'Test Connection' working without issue. Also clears up the tls warnings on the pgbouncer side.
Is it possible that subject alternate names was my issue the whole time? Is so, should we test with the flag with its original setting?
Awesome! Glad to hear all is working. Let's definitely test without the flag, just turned it off
Looking good on my end, no issues. Looking like that was the root cause.
Thanks for your help looking into this. It might be helpful to have more informative error messages for certificate issues, but they might be implemented already in a newer version. Once we enabled that flag, it gave me an error message that was more informative and made it really easy to tell exactly where the problem was.
Awesome, glad to hear the error message was finally helpful Going forward, I believe the changes in the flag will go live for everyone, so hopefully users see the improved error messages very soon!