I'm currently trying to setup temporal cloud with Retool which requires mTLS using the Helm Chart based on this Deploy Retool on Kubernetes with Helm | Retool Docs.
On the temporal cloud docs Manage certificates | Temporal Platform Documentation there is no specific requirement on how the certificate is generated (as you can bring in your own Cert Management Infra - which we're doing).
To facilitate the provisioning and lifecycle of the certificate we are using https://cert-manager.io/ to generate the end entity certificate and transforming the resulting values to what Retool expects (base64 encoded) so it can be used in the env var WORKFLOW_TEMPORAL_TLS_KEY
When I plug in the secret I get the following error (which crashes the workflow-worker pod)
TransportError: tonic::transport::Error(Transport, General("failed to parse private key as RSA, ECDSA, or EdDSA"))
at NativeConnection.connect (/retool_backend/node_modules/.pnpm/@temporalio+worker@1.11.6_@swc+helpers@0.5.3_metro@0.80.9_encoding@0.1.13_metro-minify-terser@0.80.9_/node_modules/@temporalio/worker/lib/connection.js:58:23)
at async gF (/retool_backend/bundle/main.js:11499:21822)
at async INo (/retool_backend/bundle/main.js:11567:11716)
at async Object.oIb (/retool_backend/bundle/main.js:11567:15085)
Other things I've tried:
- Changing the private key encoding (PKCS1/PKCS8)
- Switching private key algorithms (RSA, ECDSA)
- Plugging the base64 value directly in the env var in the deployment
- Using the secret directly (get a ParseKeyError)
Unfortunately outside the environment variables, there isnt much documentation on how this is supposed to work.
Helm Values for reference:
# nameOverride:
# fullnameOverride:
config:
# licenseKey: "EXPIRED-LICENSE-KEY-TRIAL"
# licenseKeySecretName is the name of the secret where the Retool license key is stored (can be used instead of licenseKey)
licenseKeySecretName: retool
# licenseKeySecretKey is the key in the k8s secret, default: license-key
licenseKeySecretKey: licenseKey
useInsecureCookies: false
# Timeout for queries, in ms.
dbConnectorTimeout: 600000
auth:
google:
clientId: clientId
# clientSecret:
# clientSecretSecretName is the name of the secret where the google client secret is stored (can be used instead of clientSecret)
clientSecretSecretName: retool
# clientSecretSecretKey is the key in the k8s secret, default: google-client-secret
clientSecretSecretKey: clientSecret
domain:
encryptionKey:
# encryptionKeySecretName is the name of the secret where the encryption key is stored (can be used instead of encryptionKey)
encryptionKeySecretName: retool
# encryptionKeySecretKey is the key in the k8s secret, default: encryption-key
encryptionKeySecretKey: encryptionKey
jwtSecret:
# jwtSecretSecretName is the name of the secret where the jwt secret is stored (can be used instead of jwtSecret)
jwtSecretSecretName: retool
# jwtSecretSecretKey is the key in the k8s secret, default: jwt-secret
jwtSecretSecretKey: jwtSecret
# IMPORTANT: Incompatible with postgresql subchart
# Please disable the subchart in order to use a managed or external postgres instance.
postgresql:
host: xxxxxx
port: "5432"
db: "retool"
user: retool
ssl_enabled: 'true'
# password:
# ssl_enabled: whether to enable SSL for connecting to the Retool Postgres database, equivalent to the POSTGRES_SSL_ENABLED environment variable. Place here instead of in the env block.
# passwordSecretName is the name of the secret where the pg password is stored (can be used instead of password)
passwordSecretName: retool
# passwordSecretKey is the key in the k8s secret, default: postgresql-password
passwordSecretKey: dbPassword
image:
repository: "tryretool/backend"
# You need to pick a specific tag here, this chart will not make a decision for you
tag: "3.253.6-stable"
pullPolicy: "IfNotPresent"
commandline:
args: []
env: {}
# Optionally specify additional environment variables to be populated from Kubernetes secrets.
# Useful for passing in SCIM_AUTH_TOKEN or other secret environment variables from Kubernetes secrets.
environmentSecrets:
- name: WORKFLOW_TEMPORAL_TLS_CRT
secretKeyRef:
name: temporal-certs
key: temporal-tls-cert
- name: WORKFLOW_TEMPORAL_SERVER_ROOT_CA_CRT
secretKeyRef:
name: temporal-certs
key: temporal-root-ca
# - name: SCIM_AUTH_TOKEN
# secretKeyRef:
# name: retool-scim-auth-token
# key: auth-token
# - name: GITHUB_APP_PRIVATE_KEY
# secretKeyRef:
# name: retool-github-app-private-key
# key: private-key
# Optionally specify environmental variables. Useful for variables that are not key-value, as env: {} above requires.
# Can also include environment secrets here instead of in environmentSecrets
environmentVariables:
- name: BASE_DOMAIN
value: xxxxxx
# - name: SCIM_AUTH_TOKEN
# valueFrom:
# secretKeyRef:
# name: retool-scim-auth-token
# key: auth-token
# - name: GITHUB_APP_PRIVATE_KEY
# valueFrom:
# secretKeyRef:
# name: retool-github-app-private-key
# key: private-key
# - name: POD_HOST_IP
# valueFrom:
# fieldRef:
# fieldPath: status.hostIP
# Enables support for the legacy external secrets (enabled) and the modern External Secrets Operator (externalSecretsOperator.enabled).
# These are mutually exclusive as both enable reading in environments variables via External Secrets.
externalSecrets:
# Support for legacy external secrets, note this is deprecated in favour of External Secrets Operator: https://github.com/godaddy/kubernetes-external-secrets
# This mode only allows a single secret name to be provided.
enabled: false
name: retool-config
# Array of secrets to be use as env variables. (Optional)
secrets:
[]
# - name: retool-config
# - name: retool-db
# Support for External Secrets Operator: https://github.com/external-secrets/external-secrets
externalSecretsOperator:
enabled: false
# External Secrets Operator Backend Types: https://github.com/external-secrets/external-secrets#supported-backends
# Default set to AWS Secrets Manager.
backendType: secretsManager
# Array of name/path key/value pairs to use for the External Secrets Objects.
secretRef:
[]
# - name: retool-config
# path: global-retool-config
# - name: retool-db
# path: global-retool-db-config
# When true, uses kubernetes-client CRDs and not external-secrets CRDs
# Defaults to true
useLegacyCR: true
files: {}
deployment:
labels: {}
annotations: {}
service:
type: ClusterIP
externalPort: 3000
internalPort: 3000
# externalIPs:
# - 192.168.0.1
#
## LoadBalancer IP if service.type is LoadBalancer
# loadBalancerIP: 10.2.2.2
annotations: {}
labels:
app: "retool"
## Limit load balancer source ips to list of CIDRs (where available)
# loadBalancerSourceRanges: []
selector: {}
# portName: service-port
ingress:
enabled: false
# For k8s 1.18+
# ingressClassName:
labels: {}
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
# - host: retool.example.com
# paths:
# - path: /
tls:
# - secretName: retool.example.com
# hosts:
# - retool.example.com
# servicePort: service-port
pathType: ImplementationSpecific
# For supporting other ingress controllers that require customizing the .backend.service.name and .backend.service.port.name,
# like AWS ALB extraPaths allows that customization, and it takes precedence in the list of paths of for the host,
# this is in order to allow a rule like ssl-redirect from port 80-->443 to be first ( otherwise there wouldn't be a redirect )
# extraPaths:
# - path: /*
# backend:
# service:
# name: ssl-redirect
# port:
# name: use-annotation
# pathType: ImplementationSpecific
postgresql:
# We highly recommend you do NOT use this subchart as is to run Postgres in a container
# for your production instance of Retool; it is a default. Please use a managed Postgres,
# or self-host more permanently. Use enabled: false and set in config above to do so.
enabled: false
ssl_enabled: false
auth:
database: hammerhead_production
username: postgres
password: retool
# IMPORTANT: When setting username to `postgres` a random password will be generated unless the postgresPassword field is uncommented
# postgresPassword: retool
service:
port: 5432
# Use the offical docker image rather than bitnami/docker
# since Retool depends on the uuid-ossp extension
image:
repository: "postgres"
tag: "13"
postgresqlDataDir: "/data/pgdata"
primary:
persistence:
enabled: true
mountPath: "/data/"
serviceAccount:
# Specifies whether a service account should be created
create: true
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
# If set and create is false, the service account must be existing
name: retool
annotations: {}
livenessProbe:
enabled: true
path: /api/checkHealth
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
enabled: true
path: /api/checkHealth
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 5
# To avoid increasing livenessProbe initialDelaySeconds the good practice is to use a startupProbe.
startupProbe:
enabled: false
path: /api/checkHealth
initialDelaySeconds: 60
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 5
extraContainers: []
extraVolumeMounts: []
extraVolumes: []
# These resource specifications apply to the main backend and workflows backend pods.
resources:
# If you have more than 1 replica, the minimum recommended resources configuration is as follows:
# - cpu: 2048m
# - memory: 4096Mi
# If you only have 1 replica, please double the above numbers.
limits:
cpu: 4096m
memory: 8192Mi
requests:
cpu: 2048m
memory: 4096Mi
priorityClassName: ""
# Affinity for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app.kubernetes.io/name"
operator: In
values:
- retool
topologyKey: "kubernetes.io/hostname"
# Tolerations for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
- key: amd64
operator: "Equal"
value: 'true'
effect: NoSchedule
# Node labels for pod assignment
# Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector:
kubernetes.io/arch: amd64
# Common annotations for all pods.
podAnnotations: {}
revisionHistoryLimit: 3
# Optional pod disruption budget, for ensuring higher availability of the
# Retool application. Specify either minAvailable or maxUnavailable, as
# either an integer pod count (1) or a string percentage ("50%").
# Ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
#
# Example:
# podDisruptionBudget:
# maxUnavailable: 1
# Common labels for all pods (backend and job runner) for pod assignment
podLabels: {}
# Increasing replica count will deploy a separate pod for backend and jobs
# Example 1: with 1 replicas, you will end up with 1 combined backend and jobs pod (unless jobRunner.enabled is true, see below)
# Example 2: with 2 replicas, you will end up with 2 backends + 1 jobs pod
# Example 3: with 3 replicas, you will end up with 3 backends + 1 jobs pod
replicaCount: 2
jobRunner:
# explicitly enable this pod if exactly 1 api backend container and
# 1 jobs runner container is desired. otherwise a replicaCount of 2
# will already launch a job runner pod
# enabled: true
# Annotations for job runner pods
annotations: {}
# Labels for job runner pods
labels: {}
backend:
# Annotations for backend pods
annotations: {}
# Labels for backend pods
labels: {}
ui:
# Annotations for ui pods
annotations: {}
# Labels for ui pods
labels: {}
workflows:
# enabled by default from Chart version 6.0.2 and Retool image 3.6.11 onwards
# explicitly set other fields as needed
# enabled: true
# ADVANCED: The temporal cluster can be scaled separately in the subchart (charts/retool-temporal-services-helm/values.yaml)
# If your needs require scaling temporal, reach out to us for guidance -- it is likely the bottleneck is DB or worker replicaCount
worker:
# A replicaCount of 1 will launch 6 pods -- 1 workflow backend, 1 workflow worker, and 4 pods that make up the executor temporal cluster
# Scaling this number will increase the number of workflow workers, e.g. a replicaCount of 4
# will launch 9 pods -- 1 workflow backend, 4 workflow workers, and 4 for temporal cluster
replicaCount: 1
backend:
# A replicaCount of 1 will launch 6 pods -- 1 workflow backend, 1 workflow worker, and 4 pods that make up the executor temporal cluster
# Scaling this number will increase the number of workflow backends, e.g. a replicaCount of 4
# will launch 9 pods -- 4 workflow backend, 1 workflow workers, and 4 for temporal cluster
replicaCount: 1
# Timeout for queries, in ms. This will set the timeout for workflows-related pods only
# If this value is not set but config.dbConnectorTimeout is, we will set workflows pod timeouts
# to .Values.config.dbConnectorTimeout
# dbConnectorTimeout: 120000
# Annotations for workflows worker and workflow backend pods
annotations: {}
# Labels for workflows worker and workflow backend pods
labels: {}
# IMPORTANT: Incompatible with retool-temporal-services-helm subchart
# This allows configuring a Retool Workflows deployment that uses your own Temporal cluster
# instead of deploying a new one. Set enabled to true and add the config variables.
temporal:
# set enabled to true if using a pre-existing temporal cluster
enabled: true
# discoverable service name for the temporal frontend service
host: xxxx.yyyy.tmprl.cloud
# discoverable service port for the temporal frontend service
port: 7233
# temporal namespace to use for Temporal Workflows related to Retool
namespace: namespace
# whether to use TLS/SSL when connecting to Temporal
sslEnabled: true
# base64 encoded string of TLS/SSL client certificate to use for mTLS
# sslCert:
# base64 encoded string of TLS/SSL client certificate secret key to use for mTLS
# sslKey:
# recommended alternative to sslKey. Name and key of k8s secret containing base64 encoded sslKey
sslKeySecretName: temporal-certs
# sslKeySecretKey
# Config for workflows worker pods. Node heap size limits can be overridden here
# otelCollector can be set to an OpenTelemetry Collector in your k8s cluster. This will configure Temporal metrics collection which
# provides observability into Workflows worker performance, particularly useful in high QPS use-cases
# environmentVariables will only be set on the workflows worker
# only change the CONCURRENT_*_LIMIT values if you have higher load usecases and deploy
# code_executor. Otherwise, the worker may OOM if the Workflows blocks use too much memory.
config: {}
# config: {
# nodeOptions: --max_old_space_size=1024
# otelCollector: {
# enabled: true
# endpoint: http://$(HOST_IP):4317
# }
# environmentVariables: []
# - name: WORKFLOW_TEMPORAL_CONCURRENT_TASKS_LIMIT
# value: "100"
# - name: WORKFLOW_TEMPORAL_CONCURRENT_ACTIVITIES_LIMIT
# value: "100"
# }
# Resources for the workflow worker only - these are sane inputs that bias towards stability
# Can adjust but may see OOM errors if memory too low for heavy workflow load
# To make adjustments to workflows backend, use top level resources key.
resources:
limits:
cpu: 2000m
memory: 8192Mi
requests:
cpu: 1000m
memory: 2048Mi
dbconnector:
java:
# Disable this to disable Retool's Java dbconnector.
enabled: true
multiplayer:
# Enable this to use Retool's experimental multiplayer editing feature.
# This feature is not ready for production use; please check with the Retool team before enablement.
enabled: false
replicaCount: 1
# Resources for multiplayer pods
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
memory: "2048Mi"
# Set environment variables for multiplayer pods, e.g. defining which origin to use
# environmentVariables:
# - name: MAIN_DOMAIN
# value: retool.foo.com
# Annotations for multiplayer pods
annotations: {}
# Labels for multiplayer pods
labels: {}
codeExecutor:
# enabled by default from Chart version 6.0.15 and Retool image 3.20.15 onwards
# explicitly set other fields as needed
# enabled: true
replicaCount: 1
# Annotations for code executor pods
annotations: {}
# Labels for code executor pods
labels: {}
# Config for code executor. Node heap size limits can be overridden here
config: {}
# config: {
# nodeOptions: --max_old_space_size=1024
# }
volumes: {}
volumeMounts: {}
# Config affinity and anti-affinity rules for the code executor pods
affinity: {}
# Resources for the code executor. Most common issues will be seen with CPU usage as this will
# most likely be CPU bound. Adjust the CPU if latency increases under load.
resources:
limits:
cpu: 2000m
memory: 2048Mi
requests:
cpu: 1000m
memory: 1024Mi
# code executor uses nsjail to sandbox code execution. nsjail requires privileged container access.
# If your deployment does not support privileged access, you can set `privileged` to false to not
# use nsjail. Without nsjail, all code is run without sandboxing within your deployment.
securityContext:
privileged: true
image:
repository: tryretool/code-executor-service
# defaults to image.tag if >= 3.20.15, otherwise defaults to 1.1.0; explicitly set to override.
# tag:
pullPolicy: IfNotPresent
retool-temporal-services-helm:
# Enable to spin up a new Temporal Cluster alongside Retool
enabled: false
server:
# Defines image to be used for temporal server
image:
repository: tryretool/one-offs
tag: retool-temporal-1.1.5
pullPolicy: IfNotPresent
# this configures grpc_health_probe (https://github.com/grpc-ecosystem/grpc-health-probe)
# for healthchecks instead of native k8s.
# Set this to true if deploying in a k8s cluster on version <1.24
useLegacyHealthProbe: false
config:
# the below values specify the database for temporal internals and workflow state
# both can point to the same db, and even the same as retool main above, although throughput
# will be limited. We strongly suggest using two total DBs: one for retool-main and one
# for default and visibility below
persistence:
default:
sql:
# host:
# port:
# the dbname used for temporal
# database: temporal
# user:
# password:
# existingSecret is the name of the secret where password is stored
# existingSecret:
# secretKey is the key in the k8s secret
# secretKey:
# options for SSL connections to database
# tls:
# enabled: true
# ัaFile:
# certFile:
# keyFile:
# enableHostVerification: false
# serverName:
visibility:
sql:
# host:
# port:
# the dbname used for temporal visibility
# database: temporal_visibility
# user:
# password:
# existingSecret is the name of the secret where password is stored
# existingSecret:
# secretKey is the key in the k8s secret
# secretKey:
# options for SSL connections to database
# tls:
# enabled: true
# ัaFile:
# certFile:
# keyFile:
# enableHostVerification: false
# serverName:
# use-cases with very high throughput demands (>10k workflow blocks/sec) can modify
# below value to be higher, such as 512 or 1024
numHistoryShards: 128
# define resources for each temporal service -- these are sane starting points that allow
# for scaling to ~3 workflow workers without hitting bottlenecks
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 100m
memory: 128Mi
# example of setting service-specific resources, here we increase memory limit for history server
history:
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 100m
memory: 128Mi
# Set environment variables for temporal pods, e.g. skipping DB creation step at startup
# environmentVariables:
# - name: SKIP_DB_CREATE
# value: "true"
# Setting this value to true will spin up the temporal web UI, admintools,
# along with prometheus and grafana clusters for debugging performance
# TODO: define this in _helpers.tpl
# visibilityDebugMode: false
# DEBUGGING: below pods can be used for debugging and watching metrics
# launches prometheus pods and listeners on temporal services and worker
prometheus:
enabled: false
# launches a grafana pod with helpful workflows executor metrics and graphs
grafana:
enabled: false
# launches the temporal web UI, which allows you to see which workflows are currently running
web:
enabled: false
# launches the temporal admintools pod, which allows you to manage your cluster, e.g. terminate workflows
admintools:
enabled: false
persistentVolumeClaim:
# set to true to use pvc
enabled: false
# set to the name of the existing pvc
existingClaim: ""
annotations: {}
accessModes:
- ReadWriteOnce
size: "15Gi"
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
storageClass: ""
mountPath: /retool_backend/pv-data
# default security context
securityContext:
enabled: false
allowPrivilegeEscalation: false
runAsUser: 1000
fsGroup: 2000
extraConfigMapMounts: []
initContainers: {}
extraManifests: []
# extraManifests:
# - apiVersion: cloud.google.com/v1beta1
# kind: BackendConfig
# metadata:
# name: "{{ .Release.Name }}-testing"
# spec:
# securityPolicy:
# name: "my-gcp-cloud-armor-policy"
# Support for AWS Security groups for pods
# Ref: https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
securityGroupPolicy:
enabled: false
groupIds: []
hostAliases: {}
# Support hostAliases on the backend deployment
# hostAliases:
# - ip: 1.1.1.1
# hostnames:
# - test.com
# - anothertest.com
telemetry:
# When enabled, will collect metrics and logs from the other pods in the
# chart. These will be forwarded to Retool for proactive monitoring and bug
# identification when `telemetry.sendToRetool.enabled = true`, and can also
# optionally be sent to a user-managed destination via
# `telemetry.customVectorConfig`.
enabled: false
sendToRetool:
# Only relevant when `telemetry.enabled = true`. When enabled, telemetry
# from pods in this chart will be forwarded to Retool for proactive
# monitoring and bug identification.
enabled: true
# Should not be changed except for chart testing.
address: "https://telemetry.retool.com:443"
image:
repository: "tryretool/telemetry"
# Default to same as top level `image.tag`.
tag: null
pullPolicy: "IfNotPresent"
resources:
requests:
cpu: 128m
memory: 128Mi
limits:
cpu: 256m
memory: 256Mi
# When present, any vector config here gets added to the telemetry pod (via a
# created configmap) and added to the internal [vector](https://vector.dev/)
# instance which runs inside. This can enable adding additional user-specified
# sinks for collected telemetry data.
#
# See [vector sinks
# documentation](https://vector.dev/docs/reference/configuration/sinks/) for
# more details.
customVectorConfig: {}
# When present, any grafana-agent config here gets added to the telemetry pod
# (via a created configmap) and added to the internal
# [grafana-agent](https://grafana.com/docs/agent/latest/) instance which runs
# inside. This can enable adding additional user-specified sources of extra
# telemetry data.
#
# The internal grafana-agent runs in Flow mode, so the config here must use
# river syntax. See [grafana-agent Flow mode
# documentation](https://grafana.com/docs/agent/latest/flow/) for more
# details.
customGrafanaAgentConfig:
serviceAccount:
create: true
name:
automountToken: true
rbac:
create: true
extraEnv: []
extraAnnotations: {}
extraPodSpec: {}
extraVolumes: []
extraVolumeMounts: []
extraPorts: []
extraContainerSpec: {}