-
Goal: I am building an app that allows users to upload documents (PDFs) to, have them read by Ai, then populate the extracted values to the rest of the app.
-
Issue: ReTool's Ai action to Convert Document to Text is returning an empty string for some PDF files.
-
Additional Info: The files are Real Estate contracts and I can not share examples unfortunately because of a NDA. However, I did some investigating and believe I know what the issue is: The files that are having no trouble have text throughout that is highlightable (extractable) text, whereas the files that are having trouble lack this - They either are partially or fully scanned and have mixed or no "extractable text"
-
Steps Taken: I've tried leveraging other text extraction libraries (pypdf) and they are having similar problems with the unextractable documents. They are able to recognize the number of pages on the document, just no text at all.
Pretty sure that OCR is needed for these types of files, but would love some input if anyone can help me configure this to work with a text extraction Ai and not require an OCR setup.