Best llm for image text extraction? Or ocr model?

I’m using chat gpt4o to extract text from product labels (image uploaded to retool storage). But I’m wondering which lm is faster for this task. Has anyone experience with this?

I read online that an OCR model might be better suited for this task and perhaps faster. But implementing AI is so much easier with Retool.

Any suggestions? Now it takes more than 6 seconds to process.

At the moment I’m using retool storage. This might also cause delays. I would use twicpics.com if it would help. Unfortunately this is only faster after the first load after caching.

By the way.. twicpics is very easy to set up and has a free tier up to 3gb. I recommend it if you are using retool storage. Instant cdn zero effort.

Hey @Steven_W ! We've had great success with AWS Rekognition.

It's fast and accurate for scene text, especially compared to other OCR and scene text models I've tried where the actual portion of the text is very small relative to the entire input image.

CLIP4STR I found had state of the art accuracy as of a month or two ago, but struggled with images beyond 2MP for small text -- cropping helped enormously.

2 Likes

I’m going to look into it. Thank you.

Did you get AWS Rekognition working in retool? It’s not available as a resource or llm option right? You could implemented from scratch or used it for another project outside retool?

No problem and yeah it will likely be a call to the Rekognition JS SDK or HTTP API from within Retool, tho I haven't tried boto3 in Retool yet. We have a simple lambda using python boto3