GPT Query Response Time Optimization

Islam_AlAfifi · February 21, 2024, 11:24am

I'm currently working on GPT implementation in Retool, experiencing an average response time of around 50 seconds with occasional delays. I've tested with sample data with personal ChatGPT account and the free Retool ChatGPT plan, obtaining similar results.

My implementation as follows: I've a Chat compoment in the front-end which is attached to a vector context. This vector contains vey simple data.

My questions are:

How can we optimize GPT response time in Retool?
Is the observed average response time expected for GPT (~50 seconds), and if so, can we explore improvements?

Your prompt assistance is appreciated.

Henry_LeGard · February 22, 2024, 12:48am

Hey - I have an AI Chat bot using the Retool AI chat component and GPT4 and I think this is standard honestly. The OpenAI API response times are about this slow too.

i don't know of a way to solve this (and don't think it's solveable yet).

Paulo · February 26, 2024, 11:42pm

HI @Islam_AlAfifi, welcome back to the forum!

Depending on the complexity of prompt we give to ChatGPT, and how many vectors we are using, we can expect the query to run somewhere between 20 and 60 seconds (even more with the more vectors we add), this is standard. Our Engineering Team is currently working on enabling the streaming feature which would make it so the response populates on the chat component as the response is being generated, instead of waiting for the whole response to be finished. This would improve the user experience by providing parts of the response in a fraction of the time. Since this will be the default behavior of the "Generate chat response" action, you won't have to change anything on your end.

Islam_AlAfifi · February 27, 2024, 11:43am

Thank you, Henry, for confirming the slow response time of GPT queries.

Islam_AlAfifi · February 27, 2024, 11:45am

Yes, I see. However, I am hitting ~50 seconds using a simple vector (=Two paragraphs of 10 lines). I hope you fix it soon.

Harsha_Wijesooriya · March 1, 2024, 7:59pm

The whole issue here is Retool is supporting only Vector databases. That is not meant to be for data analysis. It's good as a knowledge source. Take a look at Assistants API and threads. It is more optimized and meant to be for data analysis. It runs on a Python Sandbox. That gives your data analytics some wings.

Islam_AlAfifi · March 2, 2024, 6:13pm

Yes, exactly. Thank you so much. I'll a give a try for Assistants API and threads.

bencium · March 5, 2024, 5:50pm

Streaming is something we lookward to! And any tips how to speed up response times, we use a massive vector db

Paulo · March 6, 2024, 9:25pm

Hi @Islam_AlAfifi, great news!
The "Chat Component" now streams the response from ChatGPT by default with the "Generate chat response" action.

This fix came with the new Retool release today, version 3.38.0. I just tested it and it works like a charm! We no longer need to wait for the whole response to be generated.

Topic		Replies	Views
Long delay in AI response when using vectors 💬 Queries and Resources ai	6	144	December 16, 2024
Using GPT-4 "Advanced Data Analysis"? 💬 Queries and Resources	6	4104	November 24, 2023
Introducing Retool AI—a suite of features for building AI apps, workflows, and chatbots 📣 Retool Announcements	14	2513	March 21, 2024
Transform Retool-Hosted Database Table into Retool Vector 💬 App Building javascript , sql	7	42	March 27, 2025
Retool AI not fetching vectors in chat response! 💬 App Building bug , ai	6	34	January 10, 2025

GPT Query Response Time Optimization

Related topics