Vector Store Does not Chunk Documents

  1. My goal: Upload documents to the vector store to help AI Chat understand my database schema.
  2. Issue: Uploaded documents are converted to only one chunk and therefore not very useful for AI.
  3. Steps I've taken to troubleshoot: Different document formats, recreating vector store.
  4. Additional info: Cloud

I feel like I'm missing something. Shouldn't uploaded documents be chunked automatically?

Hello Steve, and welcome to Retool Community!

The Retool-managed Vectors DB dose indeed chunks the text. Though this happens automatically, it is still limited to the document's formatting (it is better to use a plain text document, and/or clear articles with headings and paragraphs).

This means that Retool AI models use embeddings to determine the meaning and connection between objects, such as blocks (chunks) of text.

In simple words, the AI model will decide which parts of the text can be grouped together into e.g: paragraphs, so that each chunk will be embedded into the DB separately, and therefore you can use them later to improve your prompt.

As in this screenshot, I've updated a simple text document, BUT it was inserted in multiple chunks (I've used the query "Get Document Chunks" to return the chunks of the document).

Give this a try and let me know!

Hi Infinitybht and thank you for your response.
I uploaded documents in numerous different formats (including plain txt) and was also using Get document chunks to identify whether it was actually chunking the data. Unfortunately, it always returned 1 chunk.

My documents contained either:

  1. Blocks of SQL with a short explanation above each query with line breaks between each section.
  2. Blocks of SQL tables including name and fields.

I eventually pivoted and used the UpsertDoc function to create a new doc for each block, which was a painful endeavor, but my LLM is now working well with the granular chunks.

Any idea why this data would not chunk automatically? Here is a small example of the content:

Table: account_owners
- id: bigint unsigned
- client_id: bigint unsigned
- user_id: bigint unsigned
 
 Table: accounts
- id: bigint unsigned
- cmt_id: bigint unsigned
- name: varchar(255)
- created_at: timestamp
- updated_at: timestamp
- deleted_at: timestamp