GPT 4 Vision not working

Hey everyone :wave:

I'm currently trying out the "Caption Image" AI action.
I was previously able to just use a URL with "gpt-4-vision", but with the new "gpt-4-vision-preview" I get the error "Image content must be a Base64 encoded string."

I also tried to actually provide a Base64 encoded string, but that doesn't seem to work either. I conducted tests both with and without quotation marks, utilizing both the prefix (data:image/jpeg;base64) and omitting it. Without the prefix I get a "400 Invalid image." error.

I'm using the example URL from this openai guide. https://platform.openai.com/docs/guides/vision
https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg

I used this image in base64 format.

Am I doing something wrong?

I think I'm in the same, or very similar boat. I'm trying to use the image classification tooling after using a RESTQuery block to fetch an image. Then for the image data for the AI tool I'm just using {{query.data.base64Data}}.

The inputs/outputs are all as I'd expect, and I'm getting the same response(s) around 400 invalid image. Pasting the image data in directly as the input returns the same error.

I came here to report the same issue, Even if I host the image on firebase or somewhere else and add a URI, I am stuck in the same place.

What are we thinking, where is could be the solution?

Also, it was earlier possible to just send it in the generate text call an Image and get a response. But that stopped working too, since chatGPT update I guess.

Hey everyone - Had some folks from the team take a look and looks like that's the response we're getting back from OpenAI (others seeing this outside of Retool). Let us know if there are any other details around this we can help look into on our side!

I don't think this is an open-ai issue. I took a small base64 encoded string and was successful with the openai vision model via api and python code. The exact same base64 encoded string fails in retool when passed directly.

A

from openai import OpenAI
import base64

client = OpenAI(api_key=userdata.get('openai'))

response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAAABEAAAASCAYAAAC9+TVUAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAC6SURBVDhPrZSNEoQgCITVe/8nvqxjDZr159CmvhknUFhRqrgLQYmCmi6HoCbIRWSW3CRVm8nSN7YBq1DeWYk6s4L+kvT5iE7EqwZrOAWOQmydiEyqVYN5aiTzWToOtlNzCIvMuoR1G8zx+sXe6m9KCfFlVJWsvicqcNHdyUzIBLRTm4xcVWJNGAkhuRG4Lnj4FbMIhM2XUCTCwTNjDri/AnYpDFgVRdRtMarAIAEkmQAou7giRJtMhPAD9Ax2zNjjGZEAAAAASUVORK5CYII=",
},
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

======

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image appears to consist of a small set of text characters arranged to represent a simplified, minimalist face. The two dots symbolize the eyes and the "v" represents a smiling mouth, creating a friendly, abstract representation of a smiling face.', role='assistant', function_call=None, tool_calls=None))

Hi jmann, my tests I posted to this thread points to retool, not openai as the source of the issue. I was able to use the openai api directly with the same image that fails in retool.

Hey @mpec255 - We definitely still have the request open on our side to dig in further, but haven't gotten to it yet with other development priorities. Happy to add any details to that to help out, though we use their node library to create the clients. So we'll want to diff the behavior with the same version of that to confirm.

Just to confirm, seeing the same behavior on the node side of things and see where the requests could be going wrong. I'll loop back here once we dig a bit more into it!

Thanks for the additional details around this, spent some time digging in yesterday and put out a fix that should get the image-based AI queries working with next week's 3.35 deploy! :smiley:

2 Likes

Thank you!
It seems like it works now with base64 encoded images, but it still gives me an error when trying to use a URL unfortunately. This was previously working fine with the "gpt-4-vision" model.

Still the same error message.
"Image content must be a Base64 encoded string."

Oh no! Can you show how you had it configured before, didn't realize URLs worked in that field (and wouldn't think they did based on the base64 error in this original thread). Just to unblock before we can take a look, does getting the base64 by the URL via a quick RESTQuery query and passing that into the image actions work for you?

1 Like

I just plain inputted the URL of an image, it was working like this before. I actually discovered this because I couldn't get it to work with Base64 on the old version :sweat_smile:
image

Yes, this works. I can work with this for now. Thank you.

Looping back on this, the fix above did solve the 400 invalid image error when valid base64 was being passed in as the image content. URLs run into the same base64 validation error you're seeing before and after that commit, so something else is involved if that behavior since changed.

I'm not sure we expect URLs checking the code and the field's help text. So I'd open a new topic if that's something you're looking for, as it will require a separate change!

Thank you for looking into this. I will just use the Base64 version for now. I will open a new topic when needed.

1 Like