I have created a workflow that extracts data (line items) from excel files. The data might not be complete in that excel file, so I want to enrich it with data from an additional PDF file associated with the excel file.
Idea is to convert the PDF file into embeddings and store them in the Vector DB so the data can be used to with an AI action or agent.
The embeddings are however temporary and strictly related to the excelfile being processed, so I want that document to be only accessible during the workflow.
So th PDF file(s) should be converted to embedding at the start of the workflow, being used while running that specific run and then being removed once the worklow ends.
How can I achieve this with Retool Vectors? Or should I use another solution?
The PDF file(s) can be quite larges so include the complete text in every LLM call is not an option.
In Retool self-hosted 3.284, there is no built-in feature to read or extract text from a PDF inside a workflow. Because of that, Retool Vectors also canβt take a PDF directly β they only work with plain text.
In the newer Retool versions (3.32 and above), Retool added AI document-processing features that can extract text from PDFs automatically. With those versions, this whole flow becomes much easier.
But in your current version (3.284), the PDF needs to be converted to text outside Retool first, and then you can generate temporary embeddings using that text and store them in workflow variables for that one run.
Iβm aware of that, so Iβm making use of a selfhosted instance of Markitdown, which is converting the PDF to MD.
However, with that data, I need to create embeddings of it and store it in a temporary vectors store, so it is only available for that single run. Then it should be removed again, so it wontβbe used for other workflow runs.
Sounds like you figured out a work around for the first part, getting the PDF converted into a format that can be added into a Retool Vector.
The second part, having the vector only available for a single run (user flow?) is a bit trickier. Currently, the only way to remove a vector is with the GUI.
I can make a feature request for a Retool API endpoint that allows for deletion of a Vector from a Retool instance based on some type of specifying parameter(such as name) and keep this thread updated with any news I hear on this.
One other idea for a work around would be to use tags on the vector, so that each vector has a tag specifying the excel file it is connected to. This way instead of deleting, you would instead specify which vector should be used and referenced based on the vectors tag.