Since about mid-October we've started noticing a few issues with our resource connection to our database.
Some background on our setup: private rds database with limited data exposed via a pgbouncer server running in front of the database. Retool cloud connects through the pgbouncer server. Pgbouncer is configured to accept only tls connections and uses self-signed certificates for tls and authentication. This set up has been working without issue for a couple of years.
Since about mid-October, I'm seeing an increased number of warnings ("WARNING TLS handshake error: handshake failed: No error information") in the pgbouncer logs and some errors due to the database user configured for retool trying to open more connections to the database than it is allowed.
I'm also no longer able to test the connection. When I try to test the connection with the 'Test connection' option in the resource ui, I get this error now: "Unable to connect. Error: self signed certificate in certificate chain". Thinking that this might also be related to the tls errors in the pgbouncer log, I tried re-implementing tls for pgbouncer using certificates from a public ca (let's encrypt), but this resulted in a different error for the resource ("No local issuer certificate"). So it seems like retool cloud possibly doesn't have ca-certificates available for lets encrypt.
The app that connects to this resource is still generally able to connect to the database. However, it does sometimes raise errors about retrieving metadata, which seems to be due to the connection errors mentioned above.
I guess I have a couple of questions:
Has there been some change to retool that would cause it to use more database connections? Should I increase the number of connections that the retool user can use (it's allowed 8 now)? How many connections does retool normally use?
What's up with 'Test connection'? It used to work no problem. Are the tls errors that it raises relevant to the usage of the resource, ie will the app that uses the resource encounter similar issues in normal operation? Do they have anything to do with the tls warnings I see in pgbouncer?Do I need to change the certificates that I use in any way to resolve this (and the tls warnings in pgbouncer)?
No worries at all, @kellen! For context, in July we changed the default setting for rejecting unauthorized SSL connections from false to true. We grandfathered in resources created before this change to still be set to false in order to not unexpectedly break connections, though certain resource connections may not have gotten grandfathered.
The error you're running into generally indicates that you need to supply a certificate. Does it work if you set it up with the connection string but turn off the TLS validation?
If you don't want to skip TLS validation, you will need to supply SSL certificates with your request or you can try changing the verification mode to "Verify CA Certificate" in the resource setting page.
That's kind of part of the issue. We have it set for full verification using self-signed client and server certificates, which wasn't an issue before. It does "work" if we turn off certificate verification, but that's not an option security-wise. The new functionality seems to reject self-signed certificates outright, but at the same time, doesn't really seem to support working with public CAs either. When this first started popping up, I tried re-implementing tls for our pgbouncer server using certificates created by let's encrypt, which also resulted in errors from 'Test Connection' ("No local issuer certificate"). This makes me think that whatever environment 'Test Connection' runs in doesn't have access to commonly available public ca certificates. I've seen this suggested in other retool support issues as well (eg MySQL+SSL: "Error: unable to get local issuer certificate").
So I guess, ultimately, my question comes down to: is there currently a supported way to use tls for connections to postgres servers using retool cloud or not? And what would that be? Are there restrictions on which public cas are supported? Because, right now, neither option appears to actually work.
Still getting an error in 'Test Connection', but this one's clearly on my end, I'll need to adjust the subject alt names of the certificates since it looks like the base domain name isn't listed. Is it possible that this has been my actual problem the whole time? If so, that's pretty embarrassing.
Looking good on my end, no issues. Looking like that was the root cause.
Thanks for your help looking into this. It might be helpful to have more informative error messages for certificate issues, but they might be implemented already in a newer version. Once we enabled that flag, it gave me an error message that was more informative and made it really easy to tell exactly where the problem was.