Embed PDF‘s with images graphs and tables in a Vector Database

mascaritas · September 4, 2024, 10:02am

dear community

Is there a recommendation for a service where I can embed PDFs with additional elements than just the text? This would help to create more comprehensive agents for a support chat or similar application.

As I understand it, one can also use external vector databases, if retool vectors don’t offer this feature

Best Greetings

Paulo · September 7, 2024, 1:31am

Hi @mascaritas, you can definitely add images to the PDF exporter, here is how:

Alternatively, @mstevenson shared a TPA that can help you create more complex PDFs:

mascaritas · September 11, 2024, 9:11pm

Dear Paulo

thanks a lot for your suggestions, def something I will also use for other workflows. However what I wanted to ask: how can one embed pdf's including their contained images into a vector-database or better yet retool vectors ?

Or does it require the use of another external service such as pinecone and embedding method X (which I am not familira with)

I hope this makes sense now as a question

Cheers

Paulo · September 11, 2024, 11:34pm

Great question! If you are using Retool AI Document Actions, our PDF parser is only parsing raw text for now, so this may not be possible if you need the AI to reference those images.

On the other hand, the PDFs, including their images, could be stored as base64 data. We could then send that data to a LLM but I'm not sure if it would be able to provide accurate information about the underlying text and images on each file.