Transform Retool-Hosted Database Table into Retool Vector

Hey yall!

Currently:

I am building an employee lifecycle management app that covers onboarding, offboarding, laptop details, and displaying assigned IdP & Google groups, and buttons that trigger and interact with other APIs.

GOAL:

  • To get rid of Google Sheets
  • HR, Finance, and IT teams to use this app
  • AI functionality that allows querying the data accurately in natural language

Dataset Details:

  • Retool Cloud-hosted database
  • Roughly - 2000 rows, 60 columns

What I’ve Tried:

• Transformed entire rows into CSV format and upserted them as a single document in Retool Vectors.

• Used the chat component to query data. It retrieves relevant information but has accuracy and performance issues.

Issues Encountered:

• Slow query time (~20 seconds per query).

• Inaccurate responses (e.g., asking “How many people are in the IT subdepartment?” returned 3 instead of 5).

Questions:

  1. What’s the best approach to improve response accuracy and speed?

  2. Should each row be upserted as its own vector document? (2000 rows → 2000 vector documents)

  3. How should I handle updates? Should vector documents be updated when the corresponding row data changes?

Would appreciate any best practices or insights!

BUMP :smiling_face_with_tear:

Hi @Tugi_Baasansuren!

It looks like your post got moved into the 'workflows' category. I just moved it over to app building, so hopefully it can get some more traction from other community builders!

Retool should work great for your use case, apologies about the performance time with Vectors. Let me reach out to our applied AI team and see if they have any thoughts on how to improve this, since ~2,000 rows and 60 columns should be a manageable amount of data.

In terms of accuracy, that is very odd that it wasn't able to get the 'subdepartment' count correct :thinking: which model were you using?

Which default embedding model did you chose when you created the vector?

Hopefully I can get some more details on how to improve these.

For question 2, my guess is that based on the scoping of how you want to get questions answered will depend on how you group data into vectors.

Where if you want an answer that spans data from multiple rows, you would want all the possible rows to be in a single vector for the LLM to analyze. It seems like breaking data into smaller vectors could improve performance time, but at the cost of accuracy but let me double check that and get back to you.

As well as for point 3, I would assume that you would need to update the information so as to not get outdated responses. In the case of pruning out older data, if you have a very large document I don't want to make you remove everything and then re-add everything for small changes.

Our applied AI team should have much more info so I will be back to you shortly with more info!!!

1 Like

Thank you for the response.

I tried ada-002 model and currently using the "text-embedding-3-large" model.

Looking forward to hear from the team.
Also regarding the best practices for " Filter labels" and " Default metadata" inputs for the use case.

For question 1, " What’s the best approach to improve response accuracy and speed? Should each row be upserted as its own vector document? "

-You should always default to insert/upsert data into existing vectors. You can also use metadata/filters to improve specific-ness".

For " How should I handle updates?"

-"Yes, you should use upsert/update document actions".

I think the time delay of vector searches being ~20 seconds is a known bottleneck out engineering team is working on currently. I can add your +1 to that ticket as well.

Also just asked about those two input fields for " Filter labels" and " Default metadata" since I also do not fully understand what those are meant to be used for :sweat_smile:

I just filed a ticket with docs team to add more info on those to our Retool docs on vectors and hopefully they/the AI team can give me a better explanation to share as well!

1 Like