GPT Query Response Time Optimization

I'm currently working on GPT implementation in Retool, experiencing an average response time of around 50 seconds with occasional delays. I've tested with sample data with personal ChatGPT account and the free Retool ChatGPT plan, obtaining similar results.

My implementation as follows: I've a Chat compoment in the front-end which is attached to a vector context. This vector contains vey simple data.

My questions are:

  1. How can we optimize GPT response time in Retool?
  2. Is the observed average response time expected for GPT (~50 seconds), and if so, can we explore improvements?

Your prompt assistance is appreciated.

Hey - I have an AI Chat bot using the Retool AI chat component and GPT4 and I think this is standard honestly. The OpenAI API response times are about this slow too.

i don't know of a way to solve this (and don't think it's solveable yet).

1 Like

HI @Islam_AlAfifi, welcome back to the forum! :wave:

Depending on the complexity of prompt we give to ChatGPT, and how many vectors we are using, we can expect the query to run somewhere between 20 and 60 seconds (even more with the more vectors we add), this is standard. Our Engineering Team is currently working on enabling the streaming feature which would make it so the response populates on the chat component as the response is being generated, instead of waiting for the whole response to be finished. This would improve the user experience by providing parts of the response in a fraction of the time. Since this will be the default behavior of the "Generate chat response" action, you won't have to change anything on your end.

1 Like

Thank you, Henry, for confirming the slow response time of GPT queries.

Yes, I see. However, I am hitting ~50 seconds using a simple vector (=Two paragraphs of 10 lines). I hope you fix it soon.

The whole issue here is Retool is supporting only Vector databases. That is not meant to be for data analysis. It's good as a knowledge source. Take a look at Assistants API and threads. It is more optimized and meant to be for data analysis. It runs on a Python Sandbox. That gives your data analytics some wings.

2 Likes

Yes, exactly. Thank you so much. I'll a give a try for Assistants API and threads.

1 Like

Streaming is something we lookward to! And any tips how to speed up response times, we use a massive vector db

Hi @Islam_AlAfifi, great news!
The "Chat Component" now streams the response from ChatGPT by default with the "Generate chat response" action.

This fix came with the new Retool release today, version 3.38.0. I just tested it and it works like a charm! We no longer need to wait for the whole response to be generated. :slightly_smiling_face:

1 Like