Nsjail aws metadata service

I have a development environment self-hosted instance of Retool running via docker-compose on ec2.

For this PoC I have the temporal cluster also running on the same ec2 instance.
In other words this is the docker-compose file I'm using, without modification, except for what I'm about to describe next for the code-executor service.

If I create a workflow with a block that uses the boto3 sdk, I get an error message Unable to locate credentials.

I found this similar report which informed me that nsjail running in privileged mode will not be able to reach the metadata service at 169.254.169.254 to automatically authenticate boto3.

Ideally, when I get to the production version of self-hosted Retool, I'll need to use an IAM role adhering to the principle of least privilege, so if possible, I'd like to find a way to set up workflows to authenticate with boto3 while sandboxing the code.

However, for the sake of trying to move forward on the development instance, I've attempted to remove nsjail from the equation.

I've updated the environment variables and privilege to false per the documentations' instructions.

code-executor:
    build:
      context: ./
      dockerfile: CodeExecutor.Dockerfile
    command: bash -c "./start.sh"
    env_file: ./docker.env
    environment:
      - CONTAINER_UNPRIVILEGED_MODE=TRUE <----- added this
      - DISABLE_IPTABLES_SECURITY_CONFIGURATION=TRUE <----- added this
      - DEPLOYMENT_TEMPLATE_TYPE=docker-compose
      - NODE_OPTIONS=--max_old_space_size=1024
    networks:
      - code-executor-network
    # code-executor uses nsjail to sandbox code execution. nsjail requires
    # privileged container access.
    # If your deployment does not support privileged access, you can set this
    # to false to not use nsjail. Without nsjail, all code is run without
    # sandboxing within your deployment.
    privileged: false <----- was true
    restart: on-failure

Strangely, with this docker-compose configuration, I get an odd error message saying the container is running in privileged mode. :thinking:

By some random chance I decided to switch from TRUE to true
which changes the reported error message.
I'm surprised the case of the value is load-bearing.

CONTAINER_UNPRIVILEGED_MODE=TRUE 
DISABLE_IPTABLES_SECURITY_CONFIGURATION=TRUE 

to

CONTAINER_UNPRIVILEGED_MODE=true <----- lowercase
DISABLE_IPTABLES_SECURITY_CONFIGURATION=true <----- lowercase

I'm hoping I can get help with the following:

  1. What does retool mean when it says the user running container is not the expected user, retool_user in retool_user group? This approach is me forcing the unsandboxed solution, but is not suitable for production.
  2. How do I authenticate boto3, either automatically by the metadata service or manually by providing environment variables into the config.
  3. If I'm going the environment variable route, could someone from retool enable those beta feature flags for my account.

Thanks in advance for the help.

Hello @bithippie,

Thank you for the well written post, let me check with our DevOps team on the best course of action to authenticate boto3.

My guess is that it would involve going the env var route. I can ask them about the feature flags we would need to enable for your account to get that set up.

For question 1, I am not sure what would be causing the "user running container is not the expected user" message.

My guess is that they are expecting the account owner/user with admin privileges, as to why that either doesn't match your uid is another question I can also run by the DevOps team to see if they have encountered that before or if that is a quirk of unprivileged mode.

Will report back soon with more details and hopefully next steps :+1:

Thanks @Jack_T.

To answer your question the error "user running container is not the expected user" was resolved by me updating the user property on the code-executor service in docker-compose

code-executor:
    build:
      context: ./
      dockerfile: CodeExecutor.Dockerfile
    user: "1001" <----- I pulled this value from the error message.
    ...snip...

I can ask them about the feature flags we would need to enable for your account to get that set up.

Thanks. Would love to get this resolved ASAP.

Hello @bithippie,

Could you give me some more details regarding the metadata service which is providing the auth credentials at 169.254.169.254?

From what I have gathered from talking to others, it seems that removing nsjail will cause more problems then it will solve.

The best option being to leave the default settings in and hard coding in the authentication credentials into the Setup Script inside of Python configuration under Settings.

Excuse my elementary example but this should be the best option to get the library set up and working and won't require any feature flags.

Our workflows team is working on supporting custom libs in unprivileged mode, so that will be supported in the future :sweat_smile:

@Jack_T - I don't currently have a settings cog. I thought that's what I was asking for when I requested the feature flags be enabled for my account. Am I supposed to do something or is that managed on Retool's side?

According to the boto3 documentation, it will do a series of credential checks ultimately falling back to AWS's metadata service if no credentials are explicitly provided -

...snip...
10. Instance metadata service on an Amazon EC2 instance that has an IAM role configured.

How the AWS metadata service works is described here.

Within an EC2 instance, http://169.254.169.254/latest/meta-data/ returns the instance metadata which includes the ec2 execution role.

Why I'm going down this path - I have a Retool workflow that makes AWS sdk calls to AWS services within the VPC e.g. EventBridge.Client.putEvent(). My EC2 execution role has permissions to putEvent. However, because of nsjail, the metadata service is unreachable. By taking it out of the sandbox, that is no longer the case and boto3 assumes the identity of the EC2 execution role, and my workflow is able to successfully put an event onto eventbridge.

My ideal scenario is for the workflow to have its own Execution Role, and the code-executor service to run sandboxed (as you said and others have said). I would then relocate the putEvent permission to that new role.

I think the thing preventing me from doing all of this is that I don't have the Settings cog, which leads me to believe I can't provide a key and secret in a setup config script, and I would have to hardcode those values into each workflow which is not secure.

I also am curious, in your screenshot - "You may reference configuration variables to use sensitive values in your set scripts" Will I be able to keep those values private? In other words are configuration variables locked down by retool permissions?

There's a lot to this and I hope I'm being clear.

The primary takeaway from this lengthy message is I don't see "Settings" nor do I see a way to make the Python configuration script you've shared in your screenshot.

1 Like

Thank you som much for the well worded explanation!

I was not very familiar with running boto3 on an EC2 instance but that answered a lot of my questions :sweat_smile:

If you are not seeing the settings cog :gear: this is likely something we can fix with the right feature flag. Let me look into which one will help us out. Which version of Retool are you running?

I'm running 3.75.10.

Hello @bithippie!

Sorry for the delayed response, hope you had a good weekend.

It looks like the feature flag names are workflowsConfigVars and workflowsConfigVarsSanitizeReadPaths.

Check to see if those are in your docker-compose file. It looks like these features were release for users without needing a flag for version 3.52.0 and higher so I am pretty surprised they aren't automatically showing up for you :face_with_monocle:

I was also poking around the 'Beta' window inside of 'Settings' to see if I could find the toggle to enable config vars. Yours might look different from mine but let me know if you see it there.

I am going to spin up a Retool instance on the same version as you and see if there are other steps to get this enabled for you to add config vars to a setup script(which will be encoded and safe for use with boto3).

@Jack_T thank you for the follow up!

I'm having trouble following the instructions provided, could you clarify?

I see your comment about workflowsConfigVars and workflowsConfigVarsSanitizeReadPaths

When I look at the Beta panel I don't see anything related to Workflow Config vars.

With regards to docker-compose, can you provide an example of what it would look like to enable these settings? There is no obvious signs of workflow config vars in either docker-compose nor the docker.env file.

Thanks!

Hi @bithippie,

Yes those two variables workflowsConfigVars and workflowsConfigVarsSanitizeReadPaths will be in your deployment infrastructures config file. These will enable the functionality needed.

Thank you for checking the Beta panel, that was more a shot in the dark as that has some feature flags but not all :sweat_smile: easier to toggle from the GUI.

Let me see if I can provide an example of where these variables live and how to enable these settings. I was hoping it was going to be as easy as command + F to pull them up in the doctor compose or docker.env file but I haven't flipped thewse flags myself yet so I will do some hunting to figure out where they live on a docker-compose deployment.

I am still surprised that these two are not enabled by default given the version of Retool that you are on, even with your deployment being self hosted :thinking:

Have you restarted your docker containers recently by chance? :sweat_smile: If by some miracle these would turn on after a hard restart of your infra that might be acceptable since the variables were not easily found.

Thank you again for your patience!

Hi @bithippie,

It looks like those feature flags were removed once they became default for all users that are on versions past 3.52 so even if we hard flipped them in your app's DB it wouldn't change things.

While researching this feature I found that it might be due to another feature flag, for custom python libraries which should both be enabled by default and be working fine if you were able to import in boto3 to a workflow block.

I was asking a co-worker about what could be the issue and it might be from turning off nsjail as that's required prerequisite to use custom libraries. I am going to check your license key to see if either custom libs or the config var features were turned off for some reason to re-activate those.

I should just need your code-executor logs note that it can use nsjail

But with true and not false.