Getting On Premise Deployment to work with SSL

I am deploying on an EC2 on an AWS VPC. Everything works fine if I create an EC2 with a public DNS address, and no SSL. But I have been unable to put the EC2 instance behind a load balancer with an SSL Route 53 end point in front - no combination of settings in docker.env and docker-compose.yml seem to work. I am terminating the SSL at the load balancer - it forwards HTTP traffic to the EC2 instance on port 3000. No matter what I put in for the "DOMAINS", BASE_DOMAIN" and "COOKIE_INSECURE" seems to work. I can log into retool, navigate around the screens, but if I try (for example) to execute the REST query in the Country Search provided sample the result is a "Run Failed" status with a message reading " error:"Unknown error"".

Hey there Stephen,

Are you getting any logs in the Retool container, which may provide some additional details, when these queries fail to run?

I'm not sure if I'm looking in the right place, but looking at what I think are the api and jobs logs seem to only show memory usage messages - nothing that looks like an error.

Hmm how are you checking those logs? It sounds like you are seeing what I would expect, but I am surprised no errors are showing with failed queries

Hey Stephen, I have an instance set up with SSL, behind a load balancer, running via docker-compose on an EC2 instance.

Here's an overview of my set up:

On the instance

  • docker.env:
...
DOMAINS=mysubdomain.mydomain.com -> http://api:3000`
...
  • docker-compose.yml:
...

https-portal:
    
    ...
    environment:
      STAGE: 'production'

...

EC2

  • Application load balancer listening on :443 and routing to mytargetgroup
  • mytargetgroup routing traffic to :3000 of the EC2 instance's elastic IP

Route53

  • Hosted zone for mydomain.com
  • A Record mapping mysubdomain.mydomain.com to $myloadbalancerurl.us-east-1.elb.amazonaws.com

Hopefully that's helpful, let me know if you have any questions.

1 Like

I do a "docker container ls" and then a "docker logs -f xxxxxx" for the different containers running. And actually this morning I do see an error, although not directly connected to the problems showing up in the web app. The log for the htttps-portal container shows these messages (which seem to happen at time of launching the containers):

021/08/25 01:21:01 [emerg] 178#178: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)

I am taking a look at why port 80 seems to be in use. Will post back when I find out what might be going on

Thanks so much - good to know that it is possible! This sounds very much like what I am doing - what do you set COOKIE_INSECURE to in your configuration?

Stephen.

Port 80 already being in use is definitely something I'd look into.

I have COOKIE_INSECURE=false

ok - I have the configuration as Kent describes. Looking the log for https-portal as it starts up it shows:
Verifying staging.mycompany.com...
Traceback (most recent call last):
File "/bin/acme_tiny", line 197, in
main(sys.argv[1:])
File "/bin/acme_tiny", line 193, in main
signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca, disable_check=args.disable_check, directory_url=args.directory_url, contact=args.contact)
File "/bin/acme_tiny", line 149, in get_crt
raise ValueError("Challenge did not pass for {0}: {1}".format(domain, authorization))
ValueError: Challenge did not pass for staging.mycompany.com: {u'status': u'invalid', u'challenges': [{u'status': u'invalid', u'validationRecord': [{u'url': u'http://staging.mycompany.com/.well-known/acme-challenge/uzVYs8V9eRxsEM2Pwy-byRcNpbfomumHouHfl0rKxJ4', u'hostname': u'staging.mycompany.com', u'addressUsed': u'35.182.69.66', u'port': u'80', u'addressesResolved': [u'35.182.69.66', u'3.97.128.163']}], u'url': u'https://acme-v02.api.letsencrypt.org/acme/chall-v3/25028809470/JfXSyA', u'token': u'uzVYs8V9eRxsEM2Pwy-byRcNpbfomumHouHfl0rKxJ4', u'error': {u'status': 400, u'type': u'urn:ietf:params:acme:error:connection', u'detail': u'Fetching http://staging.mycompany.com/.well-known/acme-challenge/uzVYs8V9eRxsEM2Pwy-byRcNpbfomumHouHfl0rKxJ4: Timeout during connect (likely firewall problem)'}, u'validated': u'2021-08-25T15:43:46Z', u'type': u'http-01'}], u'identifier': {u'type': u'dns', u'value': u'staging.mycompany.com'}, u'expires': u'2021-09-01T15:43:45Z'}

Failed to sign staging.mycompany.com, is DNS set up properly?

Failed to obtain certs for staging.mycompany.com

(not actually "mycompany.com")

Any thoughts? I think the DNS is set up correctly - it is the same as several other services running in the same AWS account.

Stephen.

1 Like

Hmm, I have seen that error come up before when letsencrypt can't access the internet. If you're deploying Retool on a VPC that cannot access the public internet, LetsEncrypt won't be able to perform the challenge necessary to provision a certificate. In this case, you'll need to manually add your certificates.

The ec2 does have internet access, I did a "curl https://www.google.com" without a problem. It does seem to have something to do with letsencrypt process not being able to validate our cert, but not sure what the exact problem might be. Should I go down the path of manually adding the certificates even though the server has internet access?

As a test, it would be good to try if you have certs you can use, but I would like to figure out why letsencrypt is having trouble signing

When we say "access to the internet" do we mean that the EC2 can go out to the internet, or that there can be inbound access FROM the internet as well? I've been doing some reading about letsencrypt and certbot, and it seems like for the process to work the letsencrypt service needs to make an http (not https) request to the server. In my case (and I think with Kevin's configuration too) I don't see how this would work, as Route53 only forwards 443 traffic to the load balancer, and the load balancer only forwards http traffic to port 3000 on the EC2 machine - nothing could ever be sent to port 80 on the EC2. How would that letsencrypt test ever work in this configuration? Does the EC2 need a public IP (maybe just temporarily) to get past the set up required?

Hey @stephenmarsh - forgive me, I wasn't as clear as I could have been in my first post. The instance I was referencing is used to test a lot of different things, and that resulted in me including some information that isn't actually relevant to what you're trying to do.

You don't need the https-portal container at all. When you add a listener to your Application Load Balancer, there is a "Default SSL certificate" setting. You can choose the "From ACM (recommended)" option, and select a certificate or "Request a new ACM certificate". [screenshot attached]

Your set-up should otherwise be correct. You should be able to comment out (or delete) the https-portal service in docker-compose.yml and restart the containers with sudo docker-compose up -d --remove-orphans.

This way, we're terminating SSL at the load balancer, and passing traffic to :3000 on your instance, where it will hit Retool.

Well now I have to apologize to you guys. You were right - the configuration is fine. Since you were so confident that it should be working I started to sniff around at some other things in our set up, and low and behold we have AWS WAF enabled on the load balancer, with a bunch of default AWS rules. Any requests going back to the Retool API that have a URI in the body were getting blocked - so for example any time i was trying run a REST query the WAF was blocking it.

Thanks for your help on this, and sorry again for me not being aware of this this earlier.

Stephen.

1 Like