How do I make Retool use one query to bulk upsert instead of thousands?

  • Goal: When I run a query that uses the GUI mode to Bulk Upsert a few hundred to 1000s of items, I expect ONE single query going to Postgres.

  • How it works today: When I execute a Bulk Upsert of an array of JS objects using GUI mode, I get at least one query hitting Postgres per object. The Retool query times out after 120s. I am refreshing a query in Postgres during the operation:

SELECT now() - query_start, query, *
FROM pg_stat_activity 
where datname = 'retool' and state = 'active'
ORDER BY query_start desc;

And I can see the query from retool toggling between INSERT and UPDATE, and each one lasts a fraction of a second. I would expect one long query (Using INSERT ON CONFLICT or MERGE) instead.

  • Steps:

  • Get a big array of JS objects (I'm using a CSV file upload and parsedValue[0]).

  • Run a bulk upsert query with primary key

  • Run the query

  • Witness a Timeout

  • During the query run pg_stat_activity and see multiple separate queries instead of one

  • Details:
    Array of data looks like:
[
  {
    "email": "something@example.com"
},
 {
    "email": "another@example.com"
 }
]

Getting Timed Out After 120,000ms

2 Likes

Hi @Jake_Moffatt,

Welcome to Retool. I believe the issue is with the upsert, vs insert. Upsert has a WHERE clause that is unique to each object, so it has to iterate through them. Insert doesn't have this limitation (no need for a WHERE clause).

Postgres can take a insert like this:

INSERT INTO director (id, name)
VALUES
    (2, 'robert'),
    (5, 'sheila'),
    (6, 'flora')
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name; 

Which allows for upserts on a unique key (in the example above, ID) without sending multiple queries to the database.

It's pretty clear the Retool algorithm is:

for object in objects
  success = doInsert(object.id, object.values)
  if ! success
    doUpdate(object.id, object.values)
  end 
end

Using up to N*2 separate queries, when this really should just be a single efficient call with only 1 query.

2 Likes

Hello @Jake_Moffatt!

Thank you so much for the great feedback and due diligence on looking under the hood at how the Retool Upsert works :man_technologist:

I will pass this along to our data/infra teams, while I might not be able to confirm or deny how the DB queries are working there's a chance you might have helped us make Upsert much more efficient and performant!

I love learning new things about Postgres queries so I will thank you for that as well!

2 Likes

Perhaps related, why don't we get an object of arrays with new IDs/updated IDs. Maybe if the code were along the lines of:

for object in objects
  let upsertInfo = {}
  success = doInsert(object.id, object.values)
  if ! success
    doUpdate(object.id, object.values)
    upsertInfo
  end
  return upsertInfo
end
1 Like

How big are the tables (row count?) you're doing here? Retool does chunking sometimes to make the data manageable, but as others mentioned here, there might be something else going on under the hood.

You can also do the chunking yourself via a javascript query that grabs things like rows 1 - 15000 and then does the upsert on that, and then it will loop again to do the next chunk. This will help your query to not time out.

Doing large amounts of Upsert rows can time out queries a lot, especially if the query isn't optimized.

I personally use a staging tables to do massive Upserts. This is because doing an Upsert via SQL code and a staging table is SO MUCH FASTER and allows for large row upserts 50K - 100K rows from what I've done.

Insert Rows into a Staging Table. In this case, I'm doing it from a csv upload to a postgre SQL table.

Upsert those staging rows into your master table:

BEGIN;

INSERT INTO master_table(
id,
date,
talent_id,
project,
task,
time_tracked

)
SELECT
id,
date,
talent_id,
project,
task,
time_tracked

FROM staging_table

ON CONFLICT (id) DO UPDATE SET
date = EXCLUDED.date,
talent_id = EXCLUDED.talent_id,
project = EXCLUDED.project,
task = EXCLUDED.task,
time_tracked = EXCLUDED.time_tracked
;

DELETE FROM staging_table;

COMMIT;

Like this solution? Work with me! (https://tropicflare.com)

2 Likes