ReTool Ai Convert Docs To Text: Unable to Parse certain documents

  • Goal: I am building an app that allows users to upload documents (PDFs) to, have them read by Ai, then populate the extracted values to the rest of the app.

  • Issue: ReTool's Ai action to Convert Document to Text is returning an empty string for some PDF files.

  • Additional Info: The files are Real Estate contracts and I can not share examples unfortunately because of a NDA. However, I did some investigating and believe I know what the issue is: The files that are having no trouble have text throughout that is highlightable (extractable) text, whereas the files that are having trouble lack this - They either are partially or fully scanned and have mixed or no "extractable text"

  • Steps Taken: I've tried leveraging other text extraction libraries (pypdf) and they are having similar problems with the unextractable documents. They are able to recognize the number of pages on the document, just no text at all.

Pretty sure that OCR is needed for these types of files, but would love some input if anyone can help me configure this to work with a text extraction Ai and not require an OCR setup.

Hi @daviswray, welcome to the forum! :wave:
I this isn't possible with any built-in Retool AI Resource. As you mentioned, we may need an OCR setup, we could create a Workflow in Retool that does that using a pyton library like:

1 Like

Hey @Paulo, thank you!

That's a much better idea than what I was planning. Thanks for connecting me to that resource, I'll give it a try!

You are welcome! Let us know how it goes. :slightly_smiling_face:

1 Like