Best llm for image text extraction? Or ocr model?

Steven_W · July 25, 2024, 11:55am

I’m using chat gpt4o to extract text from product labels (image uploaded to retool storage). But I’m wondering which lm is faster for this task. Has anyone experience with this?

I read online that an OCR model might be better suited for this task and perhaps faster. But implementing AI is so much easier with Retool.

Any suggestions? Now it takes more than 6 seconds to process.

At the moment I’m using retool storage. This might also cause delays. I would use twicpics.com if it would help. Unfortunately this is only faster after the first load after caching.

By the way.. twicpics is very easy to set up and has a free tier up to 3gb. I recommend it if you are using retool storage. Instant cdn zero effort.

trz-justin-dev · July 26, 2024, 4:22pm

Hey @Steven_W ! We've had great success with AWS Rekognition.

It's fast and accurate for scene text, especially compared to other OCR and scene text models I've tried where the actual portion of the text is very small relative to the entire input image.

CLIP4STR I found had state of the art accuracy as of a month or two ago, but struggled with images beyond 2MP for small text -- cropping helped enormously.

Steven_W · July 26, 2024, 6:05pm

I’m going to look into it. Thank you.

Did you get AWS Rekognition working in retool? It’s not available as a resource or llm option right? You could implemented from scratch or used it for another project outside retool?

trz-justin-dev · July 26, 2024, 6:21pm

No problem and yeah it will likely be a call to the Rekognition JS SDK or HTTP API from within Retool, tho I haven't tried boto3 in Retool yet. We have a simple lambda using python boto3