Dear all,
I'm writing regarding a critical issue we're experiencing with our self-hosted Retool deployment on Azure Kubernetes (version 3.75.14).
Issue Description: When executing Databricks queries that return a modest amount of data (approximately 1,000 rows), our Retool application crashes with a "503 Service temporarily unavailable" error. This crash sometimes even brings down the entire Kubernetes pod. Interestingly, smaller queries (around 10 rows) execute without issues.
Environment Details:
- Self-hosted Retool version: 3.75.14
- Kubernetes infrastructure: Azure Kubernetes Service
- VM size: Standard_D4d_v5 (4vCPUs, 16GB RAM)
- Resource monitoring shows no significant spikes, only a CPU increase during these operations
Additional Context:
- The exact same queries execute successfully in our cloud environment (which we're in the process of migrating away from)
- This issue appears to be specifically related to the Databricks query connector
Troubleshooting Performed: We've verified that our infrastructure is properly sized, and resource monitoring confirms there are no unusual spikes or resource constraints that would explain the pod crashes. This strongly suggests an application-level issue with the Databricks connector.
Request:
- Could you please advise on known issues with the Databricks connector in version 3.75.14?
- Are there any specific configuration adjustments or workarounds we should implement?
- Would upgrading to a newer version (such as 3.148 Stable) potentially resolve this issue?
This is a blocking issue for our roll-out efforts, so any assistance you can provide would be greatly appreciated. We're happy to provide additional details or logs if helpful.
Thank you for your support,