How to efficiently make large amounts of API calls in JS/Retool?

I have a CSV file with 300K+ entries from which I have to upload that data to Firebase Firestore. I am doing this in a recursive method to make the calls again and again.

Now I have the code which works just fine.

const rows = gameStatsFileButton.parsedValue[0]; // This basically gets the number of rows in the document.

// This is a delay function I made
async function delay(ms) {
  console.log("delay called for" + ms);
  return new Promise(resolve => setTimeout(resolve, ms))
}

//This is the function that triggers the method to upload the data 

async function uploadCall(thisData, index) {
  console.log("Uploaded", index);
  await UploadGameStatsToFirebase.trigger({
    additionalScope:{
      data:thisData,
    },  
    onSuccess: function(data) {}
  });  
}

// This is the main Loop function. 
// It is called recursively, to loop through the contents of CSV file and call 
// the upload method to upload that row.

async function runQuery(i) {
  if(i >= rows.length) {
    console.log("Finished running all queries");
    return;
  }
  var data = rows[i];
  console.log("Running query for row" + I);
  await delay(2000); // calling delay method to add a delay of 2 seconds.
  uploadCall(data, i);
  runQuery(i+1);
}

runQuery(0);

Since the number of rows or entries in the CSV is a lot; it will obviously take some time to upload the data. But with my current implementation, it takes ridiculously long and after a while, it crashes the browser because it overloads the thread (from my understanding at least).

Also, from my understanding; I cannot make this process synchronous at all since each API call takes 2s, meaning it'll take 600k seconds to complete the data. It's not practical at all.

I can only insert one row at a time to Firestore, from the methods I have. What is the efficient and/or industry standard for making large amounts of async API calls?

Hey @dh00mk3tu!

Recursive function calls can put a lot of strain on the stack, especially with 300K+ calls, can you try running your query in a for loop instead?

for(let i = 0; i < rows.length; i++){
  const data = rows[i];
  console.log("Running query for row" + i);
  await delay(2000);
  uploadCall(data, i);
}
console.log("Finished running all queries");

Also, out of curiosity, is it a requirement of your API that you add a particular delay? Have you already considered something like await uploadCall(data, i) or even batching your queries as alternatives that might help cut down run time?

Hi Kabir, thanks for the quick reply.

I'll put it in a for loop and test it; and will let you know.

Also, for the second question:

  1. No, there is no particular requirement of the API for the delay. I am inserting documents to Cloud Firestore using this method from the CSV file. The reason why I added delay was to let the system sleep for a while so that the triggers made for the function in the stack can be processed before more triggers for that function are added. But still, this method puts a lot of strain on the stack and eventually crashes the tab. This is on my machine which has decent specs. Other people who are going to use it; will be doing it on an apple laptop and this is not efficient.

  2. Can you put some light on how I can batch my queries/elaborate it a bit more?

  3. I will try awaiting the uploadCall method instead but, can you tell me what difference does it make when I am awaiting the UploadGameStatsToFirebase.trigger() method and when I await uploadCall

Got it, I'm not entirely sure this is the issue you're running into but if you define the function recursively setting a delay won't actually reset the stack.

Since recursive functions are first in last out, call 0 needs to wait for call 1 to complete before it leaves the stack etc. so you'll need to have a multiple of the 300k calls on the stack at some point. Using for loops gets around this since call 0 can complete independently of call 1.

As long as the logic is nicely packageable in that format (which it seems to be here) they're likely the way to go.

You can also do something like this to run the queries in parallel.

Batching your queries would depend on how you're choosing to trigger them. Using a for loop you could do something like this (note that there is no delay):

const batchSize = 100;
const batchLimit = rows.length / batchSize;

for(let batchNumber = 0; batchNumber < batchLimit; batchNumber++){
  const batchStart = batchNumber * batchSize;
  const batchEnd = batchStart + batchSize;

  console.log("Running batch " + batchNumber);
  for(let i = batchStart; i < batchEnd; i++){
    const data = rows[i];
    console.log("Running query for row" + i);
    uploadCall(data, i);
  }
}

console.log("Finished running all batches");

Using await uploadCall is only for if you want the current request to finish before triggering the next one. At the moment, uploadCall waits for UploadGameStatsToFirebase, but runQuery doesn't wait for uploadCall so the await is effectively ignored.

Let me know if any of that is helpful! There are a couple options here, I'm curious to know which one you'll take and will try and answer any questions that come up related to it (e.g. other forms of batching) :slightly_smiling_face:

Hey Kabir. Sorry for the late reply.
I was working on this only, reading the articles you shared and just going over it time and again.

There are still a couple of problems that I am facing.

  1. The browser still crashes even if I batch or attempt to run the queries in parallel.

Here is the current code structure

const rows = gameStatsFileButton.parsedValue[0];
const batchSize = 10;
const batchLimit = rows.length / batchSize;


function uploadCall(thisData, index) {
   UploadGameStatsToFirebase.trigger({
    additionalScope:{
      data:thisData,
    },  
    onSuccess: function(data) {
      // delay(5);
      // runQuery(i+1);
      console.log("Uploaded: ", index);
      // uploadedCount.setValue(index);
    }
  });  
}
async function main() {
  
    for(let batchNumber = 0; batchNumber < batchLimit; batchNumber++){
      const batchStart = batchNumber * batchSize;
      const batchEnd = batchStart + batchSize;

      console.log("Running batch " + batchNumber);
      for(let i = batchStart; i < batchEnd; i++){
        const data = rows[i];
        console.log("Running query for row: " + i);
        uploadCall(data, i);
      }
    } 
  
}

main();

From console logs, what I can understand now is,
When I call trigger the a query, it is essentially happening in three phases.
When I call the method uploadCall in batch, it is first called to scheduled, perhaps that is the correct word.
Next is the trigger, like in the screenshot below.

image

First all the rows are scheduled, then they are triggered. After this, the query is executed.

image

Now, with my current code implementation; it is obviously scheduling the request to trigger the API In batches. Which it is able to do without lagging or slowing the system.
Here's a screenshot of a CSV with 100K data.

This is nice; but now the method is triggered and at this step the browser/tab essentially crashes.
No data is actually uploaded/inserted because it crashes before the actual execution happens.

This is what I have understood so far.
I have tried awaiting uploadCall and I have tried awaiting UploadGameStatsToFirebase.trigger

If I await them both, it simply executes the upload query 1 after another, which brings me to where I started with the recursive function and ran the subsequent query only after the success of the first one.

I also tried running the queries in parallel learning from the link you shared.
This was the code:

function main() {
  const promises = rows.map((item) => {
      return UploadGameStatsToFirebase.trigger({
        additionalScope: {
          data: item
        }
     });
  });
}

return Promise.all(promises)

I am incredibly grateful for your help, I have understood a lot of things with clarity but, I am still doing something wrong or I do not understand it properly.

Any help in getting this done would be great!

Ah @dh00mk3tu I'm sorry :sweat: while the code I posted did batch the requests it didn't break the batches up at all so you'd still be making all the calls at once, my mistake.

With the for loop pattern you'd need to move the delay to be between each batch instead of removing it completely, with this you can control how much time you'd like between batches:

async function main() {
  for(let batchNumber = 0; batchNumber < batchLimit; batchNumber++){
    const batchStart = batchNumber * batchSize;
    const batchEnd = batchStart + batchSize;

    console.log("Running batch " + batchNumber);
    for(let i = batchStart; i < batchEnd; i++){
      const data = rows[i];
      console.log("Running query for row: " + i);
      uploadCall(data, i);
    }
    await delay(500);
  } 

  console.log("Finished running all batches");
}

When batching with promises wait for the moment all promises in the batch finish before calling the next:

async function main() {
  for(let batchNumber = 0; batchNumber < batchLimit; batchNumber++){
    const batchStart = batchNumber * batchSize;
    const batchEnd = batchStart + batchSize;
    const batch = rows.slice(batchStart, batchEnd);

    console.log("Running batch " + batchNumber);
    const promises = batch.map((item) => {
       return UploadGameStatsToFirebase.trigger({
         additionalScope: {
           data: item
         }
       });
    });
    await Promise.all(promises);
  } 
  console.log("Finished running all batches");
}

Have you tested running your queries with a smaller number of rows? I want to double-check that each batch can run successfully as well before trying to run them all together.

Edit: The lodash _.chunk function might also be worth exploring here. I image something like this may work as well!

async function main() {
  const batches = _.chunk(rows, batchSize);
  for(let i in batches){
    console.log("Running batch " + i);
    const promises = batches[i].map((item) => {
       return UploadGameStatsToFirebase.trigger({
         additionalScope: {
           data: item
         }
       });
    });
    await Promise.all(promises);
  } 
  console.log("Finished running all batches");
}

Hey Kabir,
Both of them seem to do the job.
With some tweaks, one can make them better, but then again - there are a couple of takeaways.
For anyone who might come across this problem:

  1. With the nested for loop method - we delay the process for a certain time. While dealing with large amounts of calls; 300k+ in my case, it essentially crashed my browser after a while.
    Because after a certain while it just triggered the method but did not execute them, and slowly a large number of requests were pending in the stack which in turn made the page unresponsive after a while.

  2. The promise method is slow. Very slow. You will await the batch to finish but each upload is synchronous, takes longer than usual for some reason; Dealing with a lot of requests would take ages. If someone comes across a solution - drop it here for anyone that might read it in the future.