Converting a complex form to JSON Schema with ResponseVault

bobthebear · July 18, 2025, 9:58pm

They're actually PDF files containing insurance policies, some of which have closer to 300 pages. It's a little different, since I'm not using forms, but I think the PDF page problem (and solution) might be the same for both of us. I've almost got something I can test, so I'll see how that goes but I'll def hit ya up in a couple days to see what else we could come up with!

Right now, I can open a PDF using a base64 string then I go through every page and pull the text and create an image. With the page images, if there are only a couple I'll use gpt-4o to perform OCR on each page but, if there are a lot of pages I cheat cause I'm cheap and use PDFMiner.six to try and get text, layout, metadata and whatever else i can.

I think all that's left is to combine OCR and original text i pulled and try to clean it up a bit before using it to create my embeddings which I can use to filter out everything unrelated then pass this and the query to a model to form a response.

I haven't quite figured out if I should use each page as a 'chunk' itself or if I should combine everything from all pages then chunk it by sections. Sometimes a section covers many pages, so if each page is a chunk I'm worried the 2nd or 3rd page of a section might not be strongly associated with the 1st page resulting in a query being answered using 1 out of 3 pages.... but if i chunk based on 'sections' it's possible all 200 pages could be accidently assumed as 1 section or actually is 1 section which would result in the same problem (context too large)

I also haven't considered how to handle holding hundreds of images in memory, I'll probly hit a Retool/workflow limit at some point

Topic		Replies	Views
Tutorial: Making PDF's in Retool 🤝 Community Show & Tell	22	25524	November 21, 2025
One Click Paste in to a text input 💬 Feature Requests	4	3777	May 5, 2020
Embed PDFs with images graphs and tables in a Vector Database 💬 Feature Requests	10	1094	March 13, 2025
Convert documents to structured data with Retool Agents 🤝 Community Show & Tell ai , agents	8	399	June 4, 2025
Populating a Retool form with data from a REST API 💬 Feature Requests form , rest-api , welcoming-replies	6	622	August 11, 2024

Converting a complex form to JSON Schema with ResponseVault

Related topics