I'm doing some research into Retool apps that are consuming inferences made from machine learning models. In particular, we're interested in applications where someone your team is making a decision that's being aided/augmented by the ML model (but we're interested in learning about other applications of ML as well). If you're working on something like this, I'd love to have a quick conversation to learn more and pick your brains. Thanks in advance!
Did you consider using local models? For example tensorflow has tflite, a package to compress models so they can run locally in the mobile phone. That would avoid serving and communication costs and also get you a lower latency from the model.