Self-hosted Retool on Azure Kubernetes crashes when processing moderate-sized Databricks query results, causing 503 errors and pod failures

jan.biederbeck · March 17, 2025, 11:33pm

Dear all,

I'm writing regarding a critical issue we're experiencing with our self-hosted Retool deployment on Azure Kubernetes (version 3.75.14).

Issue Description: When executing Databricks queries that return a modest amount of data (approximately 1,000 rows), our Retool application crashes with a "503 Service temporarily unavailable" error. This crash sometimes even brings down the entire Kubernetes pod. Interestingly, smaller queries (around 10 rows) execute without issues.

Environment Details:

Self-hosted Retool version: 3.75.14
Kubernetes infrastructure: Azure Kubernetes Service
VM size: Standard_D4d_v5 (4vCPUs, 16GB RAM)
Resource monitoring shows no significant spikes, only a CPU increase during these operations

Additional Context:

The exact same queries execute successfully in our cloud environment (which we're in the process of migrating away from)
This issue appears to be specifically related to the Databricks query connector

Troubleshooting Performed: We've verified that our infrastructure is properly sized, and resource monitoring confirms there are no unusual spikes or resource constraints that would explain the pod crashes. This strongly suggests an application-level issue with the Databricks connector.

Request:

Could you please advise on known issues with the Databricks connector in version 3.75.14?
Are there any specific configuration adjustments or workarounds we should implement?
Would upgrading to a newer version (such as 3.148 Stable) potentially resolve this issue?

This is a blocking issue for our roll-out efforts, so any assistance you can provide would be greatly appreciated. We're happy to provide additional details or logs if helpful.

Thank you for your support,

timofey · March 21, 2025, 8:27pm

Hi there, we recommend trying to upgrade to 3.114. We introduced some changes to our connector in 3.75 that led to issues querying Databricks. Please let me know if that doesn't solve the issue

Darren · April 21, 2025, 8:16pm

Have you had a chance to try upgrading, @jan.biederbeck? Don't hesitate to reach out if you have any questions about that process.

Darren · May 22, 2025, 5:06pm

Before I close the topic, can you confirm that upgrading resolved this issue, @jan.biederbeck?

Topic		Replies	Views
Self hosted Retool going down 💬 Self Hosted Retool	3	212	September 2, 2024
Encountered an error (http 500) 💬 Self Hosted Retool bug	2	742	November 13, 2023
Retool self hosted - Keep getting "JavaScript heap out of memory" 💬 Self Hosted Retool bug	9	1153	October 16, 2023
Query timeout with 504 error after 60 seconds 💬 Self Hosted Retool	3	2732	August 2, 2023
Retool DB Down? 💬 Queries and Resources bug	10	111	April 22, 2025

Self-hosted Retool on Azure Kubernetes crashes when processing moderate-sized Databricks query results, causing 503 errors and pod failures

Related topics