Deployment of Temporal cluster on Fargate

Hello, I'm trying to deploy Retool with Workflow on AWS ECS Fargate based on retool-onpremise/cloudformation/retool-workflows.fargate.yaml at master · tryretool/retool-onpremise · GitHub.

When I deploy the Temporal Cluster services (even just the first one - frontend) they are not stable and restart immediately, I get the following errors in logs:

{
    "level": "error",
    "ts": "2023-06-29T13:30:09.564Z",
    "msg": "start failed",
    "component": "fx",
    "error": "OnStart hook added by go.temporal.io/server/common/resource.MembershipMonitorProvider failed: context deadline exceeded\n\ngo.temporal.io/server/common/namespace.RegistryLifetimeHooks.func1() took 10.055703ms from:\n\tgo.temporal.io/server/common/namespace.RegistryLifetimeHooks (/home/builder/temporal/common/namespace/fx.go:46)\ngo.temporal.io/server/common/cluster.MetadataLifetimeHooks.func1() took 4.093646ms from:\n\tgo.temporal.io/server/common/cluster.MetadataLifetimeHooks (/home/builder/temporal/common/cluster/fx.go:42)\ngo.temporal.io/server/common/metrics.RuntimeMetricsReporterLifetimeHooks.func1() took 150.324µs from:\n\tgo.temporal.io/server/common/metrics.RuntimeMetricsReporterLifetimeHooks (/home/builder/temporal/common/metrics/fx.go:44)\n",
    "logging-call-at": "fx.go:1030",
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/temporal.(*fxLogAdapter).LogEvent\n\t/home/builder/temporal/temporal/fx.go:1030\ngo.uber.org/fx.(*App).Start.func1\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:666\ngo.uber.org/fx.(*App).Start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:674\ngo.temporal.io/server/temporal.(*ServerImpl).Start.func1\n\t/home/builder/temporal/temporal/server_impl.go:116"
}

{
    "level": "error",
    "ts": "2023-06-29T13:30:36.749Z",
    "msg": "unable to bootstrap ringpop. retrying",
    "service": "frontend",
    "error": "join duration of 42.153049018s exceeded max 30s",
    "logging-call-at": "ringpop.go:109",
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/membership.(*RingPop).bootstrap.func1\n\t/home/builder/temporal/common/membership/ringpop.go:109\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:170\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:194\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:171\ngo.temporal.io/server/common/membership.(*RingPop).bootstrap\n\t/home/builder/temporal/common/membership/ringpop.go:114\ngo.temporal.io/server/common/membership.(*RingPop).Start\n\t/home/builder/temporal/common/membership/ringpop.go:84\ngo.temporal.io/server/common/membership.(*ringpopMonitor).Start\n\t/home/builder/temporal/common/membership/rpMonitor.go:135\ngo.temporal.io/server/common/resource.MembershipMonitorProvider.func1\n\t/home/builder/temporal/common/resource/fx.go:268\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:120\ngo.uber.org/fx/internal/lifecycle.(*Lifecycle).Start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/internal/lifecycle/lifecycle.go:85\ngo.uber.org/fx.(*App).start\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:683\ngo.uber.org/fx.withTimeout.func1\n\t/go/pkg/mod/go.uber.org/fx@v1.17.1/app.go:773"
}

Could someone help me fix this?

1 Like

Hey @mskrip! Are you still blocked here?

I dug around internally to see if this has come up before and this advice helped another user!

A requirement here is that each broadcastAddress is unique per replica (e.g., temporal-history service with two replicas should have each replica with different broadcastAddress). This is why pod IP is preferred, otherwise would need to specify separate Kubernetes service address per replica.

Do you have a broadcastAddress set in your yaml file?