How are text chunks for embeddings generated?

nikita_from_bunch · November 16, 2023, 3:27pm

As documented here, when creating text vector entries in the Retool Vector DB, Retool does the following:

If a document is provided, Retool extracts the text.
Splits the text into chunks.
Calculates the vector for each chunk.
Stores this information in the vector.

To optimise the usage of the vector db, I want to make sure that the vectors are mutually exclusive and always only contain one topic. However, I cannot find out anything about how the text chunks are generated, for example a maximum amount of characters or similar restrictions.

Can anyone provide more information on this? Thanks in advance!

huytool157 · November 29, 2023, 8:39pm

Hey Nikita, engineer working on Retool Vectors here!

After we extracted the text from the document, we passed it into a function called chunkText, which splits the text into smaller chunks. It does this using a class called TextSplitter , which breaks the text at specified points like spaces or punctuation, respecting a maximum chunk size (1000 letters). If a chunk is too large, it's split further. The result is an array of text chunks, each within a set size limit, making the text easier to manage, especially in situations where handling large texts directly is challenging due to size constraints.

We will be experimenting to see what's the best splitting method for query performance, so feedbacks are always appreciated!

brianzjj · December 12, 2023, 7:53pm

Hi @huytool157 ,

Can we have more granular control in the Retool Vector query? For example, when we choose to upload a document to a vector, it would be nice to have options to decide the splitters, maximum chunk size, etc. Essentially, it would be an explicit call of the chunkText and TextSplitter you mentioned above.

jpmin · December 14, 2023, 3:37pm

Also being able to create document vectors with workflows / queries would be needed.

Topic		Replies	Views
Retool Vector - Customize Document Chunking 💬 Feature Requests	9	716	October 29, 2024
Vector Store Does not Chunk Documents 💬 Queries and Resources	4	31	May 29, 2025
Vectors and Embedding Options 💬 Queries and Resources ai	5	916	May 9, 2025
Access Vector Value from Document Chunks 💬 App Building ai	6	250	September 19, 2024
Programmatically create vector embedding in Retool Vector 💬 Queries and Resources	3	36	April 11, 2025

How are text chunks for embeddings generated?

Related topics