Convert Document to the text in workflows

I have an S3 bucket containing a large amount of PDF documents. I want to convert those PDFs into text and extract entities from the PDFs. I did this with a web app. Does anyone know how to do this using retool workflows?

If retool workflows doesn't support this, can anyone suggest other options for achieving this?

Workflows supports you running Python code to extract the PDF text (like pdfplumber and a few others) or sending the PDF to another service via API to extract the text.

How did you do it in your web app? Maybe you can re-use some of the code (or the approach).

Hi @jg80, in the web app Retool, there is an AI query to extract text from PDFs and extract entities from that text. I want to do that in Retool workflows.

Got it - forgot about that option in the web app, which is not available in the workflow to the best of my knowledge.

Hi @Rashmika_Lakshan! As you've noticed, the Retool AI block inside of a workflow doesn't have the pre-configured option to extract text from a PDF document. It's something that our team is aware of and actively working on - I'll be sure to share any updates here as they come in.

That said, it's still possible to accomplish this! As @jg80 mentioned, workflows do support executing both JS and Python scripts. There are quite a few libraries that will extract text from a PDF, especially when working with Python - pypdf is a good one that is well-documented. You can add it to your workflow's environment here:

You can check the library's documentation for insight into using the library! Don't hesitate to follow up here if you have any further questions. And hopefully we can get that feature rolled out for you soon. Happy building!

3 Likes