Self-hosted Retool on Azure Kubernetes crashes when processing moderate-sized Databricks query results, causing 503 errors and pod failures.

Dear all,

I'm writing regarding a critical issue we're experiencing with our self-hosted Retool deployment on Azure Kubernetes (version 3.75.14).

Issue Description: When executing Databricks queries that return a modest amount of data (approximately 1,000 rows), our Retool application crashes with a "503 Service temporarily unavailable" error. This crash sometimes even brings down the entire Kubernetes pod. Interestingly, smaller queries (around 10 rows) execute without issues.

Environment Details:

  • Self-hosted Retool version: 3.75.14
  • Kubernetes infrastructure: Azure Kubernetes Service
  • VM size: Standard_D4d_v5 (4vCPUs, 16GB RAM)
  • Resource monitoring shows no significant spikes, only a CPU increase during these operations

Additional Context:

  • The exact same queries execute successfully in our cloud environment (which we're in the process of migrating away from)
  • This issue appears to be specifically related to the Databricks query connector

Troubleshooting Performed: We've verified that our infrastructure is properly sized, and resource monitoring confirms there are no unusual spikes or resource constraints that would explain the pod crashes. This strongly suggests an application-level issue with the Databricks connector.

Request:

  1. Could you please advise on known issues with the Databricks connector in version 3.75.14?
  2. Are there any specific configuration adjustments or workarounds we should implement?
  3. Would upgrading to a newer version (such as 3.148 Stable) potentially resolve this issue?

This is a blocking issue for our roll-out efforts, so any assistance you can provide would be greatly appreciated. We're happy to provide additional details or logs if helpful.

Thank you for your support,

1 Like

Hi there, we recommend trying to upgrade to 3.114. We introduced some changes to our connector in 3.75 that led to issues querying Databricks. Please let me know if that doesn't solve the issue

1 Like

Have you had a chance to try upgrading, @jan.biederbeck? Don't hesitate to reach out if you have any questions about that process.