Hi all, thanks again for the feedback and for being Retool customers. As David said, reliability and stability are critically important for us. For this particular incident, we’ve investigated it in-depth, and discovered that we didn’t test our logic against a broad enough range of expected outputs. We’ve put in place guardrails to make sure something like this doesn’t happen in the future, via an improved test suite and expanding our coverage in staging.
Separately, we’re also setting reliability and stability goals, and have made them a key result for our Q4 OKRs. Ensuring Retool is always stable is going to be a core investment for us going forward.
As the head of engineering, I must also note that Retool is a fairly complex platform with a lot of surface area. There are really a lot of permutations of problems that can occur, and it is not easy to automatically test (and then debug) every conceivable use case. That said, we are making immediate improvements against our test coverage, staging environments, and programmatic observability, and expect to get to several more followup action items in the near future. Thank you for the feedback!
I think that clear and rapid communication is important here as well, and that’s something we’ll be focusing on too. We’re a fairly lean team still (27 engineers total!) but we want the community to hear from us - both when we’re shipping new features and meeting your growing needs, but also when we let you down, so we can resolve it quickly.
As David said, we’re really sorry for the downtime, recognize this is not acceptable, and appreciate you all as customers and users. If you have any questions, comments, or concerns, feel free to follow up here, or reach out directly (I’m at email@example.com; David is at firstname.lastname@example.org).