OpenAI Vision - Support for Multiple Images and URLs vs. Base64

fenix · September 27, 2024, 8:23am

Hi, the 'Generate text from image' action in a Retool AI workflow block has 2 limitations:

The image must be provided as base64 rather than a URL
Only one image may be provided

Neither of these two constraints apply to Open AI's underlying vision platform, in fact, per their documentation, providing image URLs is preferred.

It would be nice to relax these constraints and allow a more native pass-through to the Open AI platform. This could also include adding support for the recommended resolution flag, which allows the developer to specify low resolution (cheaper) vision processing.

Jack_T · October 4, 2024, 11:24pm

Hello @fenix!

Thank you for bringing this to our attention. I agree we should improve the AI workflow block for a more native pass-through, as these are not limitations on the Open AI platform.

I also like the idea for a resolution flag for specifying image quality to give further control over vision processing.

I can make a feature request for both of these and keep you updated on any news I hear!

Jack_T · October 22, 2024, 8:09pm

Hello @fenix,

Hope you are having a great day. I just got word from our engineering team that this feature request has been completed!

It should be going out live to cloud instances of Retool and the latest version for self hosted users shortly

fenix · October 23, 2024, 1:39pm

Wow, that's awesome, thank you!

Topic		Replies	Views
GPT 4 Vision not working 💬 App Building openai , ai	14	2895	March 25, 2024
Introducing Retool AI—a suite of features for building AI apps, workflows, and chatbots 💥 Product Updates ai	14	3056	March 21, 2024
Can add multi image in AI Action? 💬 App Building ai	3	56	June 2, 2025
Is there a plan to support an image upload feature in Retool AI chat component, especially for RAG use-cases? 💬 Feature Requests ai	2	78	May 13, 2025
Cannot pass multiple images to GPT-4-Vision 💬 Queries and Resources ai	4	1047	May 20, 2024

OpenAI Vision - Support for Multiple Images and URLs vs. Base64

Related topics