Retool Workflows - Batch Loop of Data to Open API

Hi,

I am wanting to send account data from my postgres instance to my openapi to populate segment sub segment data. The dataset I have is pretty large so I want to do this in batches. I cant get the rest api to iterate through the batches? Any idea what I am doing wrong?

Thanks

This is what I have done -

Step 1 -
SELECT api_id, "name" as account_name
FROM account
limit 50

Step 2 -
Loop sql query batch size 10

Step 3 -
Convert loop to Json

/// Step 1: Load the data from query7
const data = APIACCOUNT.data; // Assuming query7.data is already formatted as an array

// Log the loaded data to verify
console.log('Loaded data:', data);

// Step 2: Transform the data to NDJSON format
const ndjson = data.map(item => JSON.stringify(item)).join('\n');

// Log the transformed NDJSON data to verify
console.log('NDJSON data:', ndjson);

// Step 3: Return the transformed data
return ndjson;

Step 4 -

REST API to OPENAPI as follows

{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "Classify the following accounts into predefined segments (Betting, Media, Pro) and sub-segments. The predefined segments are:
1. Betting: This includes any company related to sports betting, online casinos, odds providers, sportsbook platforms, etc.
2. Media: This includes companies in broadcasting, sports media, digital media, news agencies, TV stations, and OTT platforms.
3. Pro: This is for any company related to professional sports organizations, such as teams, leagues, federations, and stadiums.
The sub-segments under these categories are:
- Betting: Sportsbook, 3rd Party Betting Partner, Fantasy, Gaming & Affiliates, Professional Bettors & Syndicates.
- Media: Digital Media & Websites, Tech, Brands & Agencies, Broadcaster / OTT, Academia.
- Pro: Team, College / University, League / Federation, Player Agents, Pro Partner."
},
{
"role": "user",
"content": "Please classify the following accounts and return the result as a JSON object with keys 'api_id', 'account_name', 'segment', and 'subsegment' for each account. Here are the accounts: {{code1.data}}"
}
]
}

Hello @RA_SYD!

Would you be willing to share a screenshot of your workflow nodes to help me better understand where the loops are executing? From reading your description it seems like you need to make the REST API call to OPENAPI a function that is called from a loop block running JS Code to trigger the call. The loop can then be run in the batch size you require and also handle responses from each batch in the loop.

I see you're using GPT-3.5 Turbo, but what API version and enpoint are you using?

one thing I noticed is that you're requesting a JSON object from the model but if you don't supply the 'response_format' property the model may not respond with a json object (like a normal chat completion response). GPT-3.5-turbo only allows you to specify all responses must be in a VALID json object but it doesn't not require the object to contain all of properties you specify (api_id, account_name, segment and subsegment in your case). GPT-4.o-mini and later allows you to specify json_schema as the response format. this format does guarantee the JSON object will be structured exactly as you expect every single time (schema adherance).

for 3.5:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You will be given a structured array of accounts that are each to be classified into predefined segments (Betting, Media, Pro) and sub-segments. The predefined segments are:
      1. Betting: This includes any company related to sports betting, online casinos, odds providers, sportsbook platforms, etc.
      2. Media: This includes companies in broadcasting, sports media, digital media, news agencies, TV stations, and OTT platforms.
      3. Pro: This is for any company related to professional sports organizations, such as teams, leagues, federations, and stadiums.
      The sub-segments under these categories are:
      - Betting: Sportsbook, 3rd Party Betting Partner, Fantasy, Gaming & Affiliates, Professional Bettors & Syndicates.
      - Media: Digital Media & Websites, Tech, Brands & Agencies, Broadcaster / OTT, Academia.
      - Pro: Team, College / University, League / Federation, Player Agents, Pro Partner."
    },
    {
      "role": "user",
      "content": "Please classify the following accounts and return the result as a JSON object with keys 'api_id', 'account_name', 'segment', and 'subsegment' for each account. Here are the accounts: {{code1.data}}"
    }
  ],
  "response_format": { 
    "type": "json_object" 
  }
}


link

1 other thing i noticed, you can trim down your token count by only supplying {{code1.data}} in the content of the user message:

{
"role": "system",
"content": "You will be given a structured array of accounts that are each to be classified into the segments and sub-segments defined inside the <SEGMENTS> tag and <SUBSEGMENTS> tags below:
<SEGMENTS>
- Betting: This includes any company related to sports betting, online casinos, odds providers, sportsbook platforms, etc.
- Media: This includes companies in broadcasting, sports media, digital media, news agencies, TV stations, and OTT platforms.
- Pro: This is for any company related to professional sports organizations, such as teams, leagues, federations, and stadiums.
</SEGMENTS>
<SUBSEGMENTS>
- Betting: Sportsbook, 3rd Party Betting Partner, Fantasy, Gaming & Affiliates, Professional Bettors & Syndicates.
- Media: Digital Media & Websites, Tech, Brands & Agencies, Broadcaster / OTT, Academia.
- Pro: Team, College / University, League / Federation, Player Agents, Pro Partner."
</SUBSEGMENTS>
},
{
"role": "user",
"content": "{{code1.data}}"
}

Hi Both @pyrrho @bobthebear, really appreciated the fast response and noted on the points, which I will add into the structure.

What I am struggling with is getting the batches to continuously send to the api to then produce a final outcome with them all together. The size of the account data to be sent could be up to 5000. I have shared screengrabs below, would be great to know where I am going wrong? See original workflow below? Any ideas? Thanks!

Hey there, @RA_SYD!

I think you should make code1 a loop block and turn query1 into a function. In code1, you'd be able to use the APIACCOUNT as the source array and then build the scripting logic for error handling using a try catch. The loop block will let you send the data in batches of your preferred size and allows you to delay between iterations.

This post keeps becoming relevant, lately: Loop block tells me to refer to {{ value }}, then says it's undefined - #2 by pyrrho

still cant figure out what I am doing wrong, how do we get it to iterate through the batches? is this supported in workflows.

Thanks again!

ah, so that error you see on the right with the 400 status code means you either tried too send too much at once or you tried to send too much too fast.

the error mentions a TPM (Tokens Per Minute) of 10,000 and it says you're actually using the gpt-4 model. looking at the chart below from [here](https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-one it would look you're currently at Tier 1.

RPM is Requests Per Minute. This is how many times you can send a REST request to their API in total over a minute.

RPD is Requests Per Day. same as above, only it defines the max requests you can send their API over a singpe 24 hr period.

TPM is Tokens Per Minute where "Tokens" can just be thought of as the total word count of everything you send openai in the body of th1e request, which includes message history, examples, system instructions, and user input. This isnt exactly right however, and the Token Count is actually more like a Syllable Count (which is always larger than the word count)

Batch Queue Limit is the different one. when you create a batch request OpenAI creates a queue and starts orocessing each batch job. While all of these are processing, the Token Count of every request queued in the batch is counted against your Batch Queue Limit. Once OpenAI is done processing the batch, its Token Count is no longer counted against your Queue Limit.

IMPORTANT: What can be easily missed is that if you exceed any 1 of the first 3 rate limits, your request will fail and continue to do so until that rate resets

Solutions: well, if you havent passed any of the other rate limits yet then TPM is rate per min.... give it a few before trying again? lol, ok seriously though does your use case require the GPT-4 model? If not, Id suggest trying GPT-4o-mini or you could try maybe text-embedding-3-large but im not sure itd work as well, it just depends on what your doing.

TLDR;
Honestly though, I wish I remembered this before typing all that out, but Retool provides a text classification example that might work better? itd def respond faster and its cheaper

next lets take a look at query2, the loop block. this loop, all it does is take each row from APIACCOUNT, create an appropriate json obj, push it onto a list, then return the list.

you said you were having a problem w

What I am struggling with is getting the batches to continuously send to the api

however, after query2 finishes the far right block sends a REST request to OpenAI and then the workflow ends or theres more blocks, but what you've got here it'll never contiuously call the api. it wont even do it twice right now.

finally, because im backwatds, on the APIACCOUNT block can you click on the JSON tab and double check the data types actually being returned? then also check Inputs on the query2 block for the same thing. the bottom of query2 shows a 3D array of JSON objects being returned where it looks like the 1st two array depths might always be length 1. if this is the case, youve instructed the model to use the given structured array of objects, not an array of an array of arrays (3D is way less confusinf lol) so it may not know to check nested structures/arrays which would end query2 w it retur ing an empty array.

oh ya i forgot, the APIACCOUNT block might ne returning this weird SQL array thingy.... you may need to use formatArrayAsObject(my_array)

darn 1 more thing, since query 2 is only doing synchronous operations you dont need an iteration delay (not the 2ms would really ever be noticed unless ur in 10k iteration+ range. aslo each iteration doesnt need to know or care about data from any previous iteration, they could be ran in any order and nothing would change, so you could actually set this loop to Parallel mode and process all iterations at once-ish instead of one after another.

Hey @RA_SYD! Let me know if you still need assistance with this, as I'm happy to take a look. I think the solution hinges upon @pyrrho's suggestion to actually initiate your API request to OpenAI from within the body of your loop. :+1: