Hi, the 'Generate text from image' action in a Retool AI workflow block has 2 limitations:
- The image must be provided as base64 rather than a URL
- Only one image may be provided
Neither of these two constraints apply to Open AI's underlying vision platform, in fact, per their documentation, providing image URLs is preferred.
It would be nice to relax these constraints and allow a more native pass-through to the Open AI platform. This could also include adding support for the recommended resolution flag, which allows the developer to specify low resolution (cheaper) vision processing.