Long delay in AI response when using vectors

I've been using Retool for a few weeks, especially the AI ​​actions. I've noticed that when I use a vector as a source of information, the response from the Chatbot AI component is quite slow (around 10 seconds).

I use the streaming option to generate the response as it is being processed, but it still takes a long time until the first character is generated and started to be typed.

When I don't use vectors, the response is much faster, around 1-2 seconds.

Is this normal? Has anyone managed to fix it somehow? Are there plans for improvements?

2 Likes

Hello @exxscher,

Apologies for the slowness, we are definitely looking to improve the performance speed of AI tools.

I would imagine that there are some processes going on behind the scenes for vectors that are taking more time to get the data to the LLM model for output generation.

How much data is the vectors you are using?

@Jack_T, the amount of data in my vector is minimal, about 2 pages in a pdf file. The slow response occurs regardless of the size of the vector, as I tested with a much larger one, and the response in all cases is around 10 to 12 seconds.

I also tested on a new application, and with different AI models.

I have been testing on different days, for about 2 weeks now, to test if it was a temporary problem.

Thank you for the extensive testing!

That is odd, we definitely do not want users to have to wait that long. Are you self hosted or on the cloud?

What temperature do you have the query set to? Could you share a screenshot of your query set up?

I just did some tested with a 13 MB vector PDF on the history of France and it seemed to work within a second or two using GPT-4o-mini and temperature of 1.

This was my set up. If you have any steps I can copy to reproduce the slow response speed, that would be very helpful for me to reproduce this behavior and share with our engineering team so we can fix this :saluting_face:

Or if you could DM me a video of the behavior as well that might provide more clues to me and the team!

Hi, sorry for the delay, it was a busy week.

I just recorded a video, where I did the following steps:

  1. I generated a web application.
  2. I used the Chat AI component.
  3. I configured the query with a vector*.

Video:

Test:

  1. I asked a question in the Chat AI, and the answer took 13 seconds to start generating in the interface.
  2. I removed the vector from the configuration, and the answer started generating after 2 seconds.

Observations:

  • When using larger or smaller vectors, the response is just as slow.
  • When not using vectors, the response is usually quite fast, even with much more complex questions.

Greetings!

Hello @exxscher!

No worries on the delay, thank you for providing the video and detailed documentation this helps us a lot.

I filed a ticket for out engineering team to look into vector performance.

I was told that unfortunately this time delay is expected for vectors currently :smiling_face_with_tear: We are looking to get this improved as soon as possible, it seems to be a bottleneck that is related to our app interfacing with vectors regardless of their size.

For now, if your main concern is response time I would recommend avoiding using vectors :melting_face:. Using vectors when you are ok with a slower response and need that response to rely on specific data stored in a vector that a user/LLM would not have access to otherwise.