"duplicate key value violates unique constraint" on bulk upsert via primary key function

Anish_Srinivasan · September 10, 2024, 7:49am

Hi! I am on the newer side to Retool, so apologies if I am making a silly mistake here I am attempting to bulk upsert data from a code block (I am chunking this data into 500 rows because there are ~6000 rows I am attempting upsert on average) into a retool db. This works perfectly fine on the first go, when my retool db is completely empty, but when I attempt to re-run to test for updating, I get the following error:

I see that the error shows the query:

insert into "waystar_payer_list" ("accepts_dual_clearing_houses", "payer_id", "payer_name", "related_payer_name", "requires_enrollment_claims_monitoring", "requires_enrollment_elig", "requires_enrollment_estimation", "requires_enrollment_inst_claims", "requires_enrollment_prof_claims", "requires_enrollment_remits", "secondary_claim_format", "supports_claims", "supports_claims_attachments", "supports_claims_monitoring", "supports_elig", "supports_estimation", "supports_remits", "unique_id", "updated_at") values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)

but I expected something along the lines of this:

INSERT INTO waystar_payer_list (
unique_id, payer_id, payer_name, related_payer_name, requires_enrollment_prof_claims, requires_enrollment_inst_claims,
secondary_claim_format, requires_enrollment_remits, requires_enrollment_elig, requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses, supports_claims_attachments, requires_enrollment_estimation, supports_claims,
supports_remits, supports_elig, supports_claims_monitoring, supports_estimation, updated_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)
ON CONFLICT (unique_id)
DO UPDATE SET
payer_id = EXCLUDED.payer_id,
payer_name = EXCLUDED.payer_name,
related_payer_name = EXCLUDED.related_payer_name,
requires_enrollment_prof_claims = EXCLUDED.requires_enrollment_prof_claims,
requires_enrollment_inst_claims = EXCLUDED.requires_enrollment_inst_claims,
secondary_claim_format = EXCLUDED.secondary_claim_format,
requires_enrollment_remits = EXCLUDED.requires_enrollment_remits,
requires_enrollment_elig = EXCLUDED.requires_enrollment_elig,
requires_enrollment_claims_monitoring = EXCLUDED.requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses = EXCLUDED.accepts_dual_clearing_houses,
supports_claims_attachments = EXCLUDED.supports_claims_attachments,
requires_enrollment_estimation = EXCLUDED.requires_enrollment_estimation,
supports_claims = EXCLUDED.supports_claims,
supports_remits = EXCLUDED.supports_remits,
supports_elig = EXCLUDED.supports_elig,
supports_claims_monitoring = EXCLUDED.supports_claims_monitoring,
supports_estimation = EXCLUDED.supports_estimation,
updated_at = EXCLUDED.updated_at;

Am I using this upsert functionality incorrectly? Happy to provide more context if needed!

Anish_Srinivasan · September 10, 2024, 7:50am

for more context - my function, along with the code being executed:

MiguelOrtiz · September 10, 2024, 9:55am

Hey there @Anish_Srinivasan,

The error specifically says:

duplicate key value violates unique constraint "waystar_payer_list_pkey"

This error indicates that the unique_id you're generating is conflicting with an existing key in the database. A few potential reasons for this error include:

Improper generation of unique_id: The unique_id is being generated by replacing spaces or special characters in the payer_name and concatenating it with the payer_id. Any chance the resulting value is not unique across the entire dataset?

Potential duplicates: If the source data (waystar_payer_list) contains records with the same payer_name and payer_id combination, after applying the transformation, multiple records could be trying to insert with the same unique_id, causing duplication.

This would explain why the first run works well, but not the second.

Anish_Srinivasan · September 10, 2024, 10:12am

Hey Miguel! Thank you for quick response here - I can confirm that the regex replace isn’t impacting anything and my unique_id being passed in is indeed unique!

Also from my understanding the upsert should update any pk’s already existing and add any net new, correct? I don’t understand why there would be any duplication in pk

MiguelOrtiz · September 10, 2024, 11:22am

I can confirm that the regex replace isn’t impacting anything and my unique_id being passed in is indeed unique!

Ok cool.

the upsert should update any pk’s already existing and add any net new, correct? I don’t understand why there would be any duplication in pk

Yes, this was my first thought as well. One idea, but it is not applicable based on your confirmation above, is that one of your chunks might be producing non-unique IDs for some records before the upsert happens. This could lead to a scenario where two records are generating the same unique_id but are trying to be inserted as new rows instead of triggering an update. But we can exclude that.

Another thought is, could it be that the issue is related to the timing of how each chunk is processed, which may cause conflicts if the database isn't fully aware of the latest state after processing each chunk? I threw this quickly to chatgpt and the suggestion came across like this:

// Function to process chunks sequentially
async function processChunksSequentially(chunks) {
    for (const chunk of chunks) {
        try {
            // Generate unique_id for each record in the chunk
            const chunkWithIds = chunk.map(e => ({
                ...e,
                'unique_id': e.payer_name.replace(/\s+/g, '_') + '_' + e.payer_id,
            }));

            // Perform upsert operation for the current chunk
            await dbConnection('waystar_payer_list')
                .insert(chunkWithIds)
                .onConflict('unique_id')  // When 'unique_id' exists, update the record
                .merge();  // Merge the existing record with the new values

            console.log('Chunk processed successfully');
        } catch (error) {
            console.log('Error processing chunk:', error);
            throw error;  // Rethrow the error to ensure it gets handled appropriately
        }
    }

    console.log('All chunks processed successfully');
}

// Assuming waystar_payer_list_array is your original data array

// Split the data into chunks
const chunks = chunkArray(waystar_payer_list_array, 500);

// Process the chunks sequentially
await processChunksSequentially(chunks);

Hope this helps identify the root cause!

Anish_Srinivasan · September 10, 2024, 5:12pm

I'm happy to try this, but from my understanding, if the chunking worked fine during the initial data insertion but is now causing issues when upserting (i.e., updating existing records), it’s likely not the chunking itself that's the problem. I am breaking my dataframe up in to 500 rows, each with unique data. I feel like issue lies with how the upsert logic is handling updates to existing rows!

Anish_Srinivasan · September 10, 2024, 6:41pm

MiguelOrtiz:

// Function to process chunks sequentially
async function processChunksSequentially(chunks) {
    for (const chunk of chunks) {
        try {
            // Generate unique_id for each record in the chunk
            const chunkWithIds = chunk.map(e => ({
                ...e,
                'unique_id': e.payer_name.replace(/\s+/g, '_') + '_' + e.payer_id,
            }));

            // Perform upsert operation for the current chunk
            await dbConnection('waystar_payer_list')
                .insert(chunkWithIds)
                .onConflict('unique_id')  // When 'unique_id' exists, update the record
                .merge();  // Merge the existing record with the new values

            console.log('Chunk processed successfully');
        } catch (error) {
            console.log('Error processing chunk:', error);
            throw error;  // Rethrow the error to ensure it gets handled appropriately
        }
    }

    console.log('All chunks processed successfully');
}

// Assuming waystar_payer_list_array is your original data array

// Split the data into chunks
const chunks = chunkArray(waystar_payer_list_array, 500);

// Process the chunks sequentially
await processChunksSequentially(chunks);

Hi Miguel! Any other ideas you have as to why I can't get this process to work? I would try your method except I don't have access to our retool DB credentials and I can't set up a dbConnection

MiguelOrtiz · September 10, 2024, 7:58pm

Hey @Anish_Srinivasan,

Sorry, no other ideas come to mind.

Have you tried inserting all of your data in one go for testing purposes and to see if you receive the same error??

Anish_Srinivasan · September 10, 2024, 8:15pm

I can try this, but I think the main issue here is that my error logs clearly show the query being run as:

insert into \"waystar_payer_list\" (\"accepts_dual_clearing_houses\", \"payer_id\", \"payer_name\", \"related_payer_name\", \"requires_enrollment_claims_monitoring\", \"requires_enrollment_elig\", \"requires_enrollment_estimation\", \"requires_enrollment_inst_claims\", \"requires_enrollment_prof_claims\", \"requires_enrollment_remits\", \"secondary_claim_format\", \"supports_claims\", \"supports_claims_attachments\", \"supports_claims_monitoring\", \"supports_elig\", \"supports_estimation\", \"supports_remits\", \"unique_id\", \"updated_at\") values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)

when in reality the query should be something more like this:

INSERT INTO waystar_payer_list (
unique_id, payer_id, payer_name, related_payer_name, requires_enrollment_prof_claims, requires_enrollment_inst_claims,
secondary_claim_format, requires_enrollment_remits, requires_enrollment_elig, requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses, supports_claims_attachments, requires_enrollment_estimation, supports_claims,
supports_remits, supports_elig, supports_claims_monitoring, supports_estimation, updated_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)
ON CONFLICT (unique_id)
DO UPDATE SET
payer_id = EXCLUDED.payer_id,
payer_name = EXCLUDED.payer_name,
related_payer_name = EXCLUDED.related_payer_name,
requires_enrollment_prof_claims = EXCLUDED.requires_enrollment_prof_claims,
requires_enrollment_inst_claims = EXCLUDED.requires_enrollment_inst_claims,
secondary_claim_format = EXCLUDED.secondary_claim_format,
requires_enrollment_remits = EXCLUDED.requires_enrollment_remits,
requires_enrollment_elig = EXCLUDED.requires_enrollment_elig,
requires_enrollment_claims_monitoring = EXCLUDED.requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses = EXCLUDED.accepts_dual_clearing_houses,
supports_claims_attachments = EXCLUDED.supports_claims_attachments,
requires_enrollment_estimation = EXCLUDED.requires_enrollment_estimation,
supports_claims = EXCLUDED.supports_claims,
supports_remits = EXCLUDED.supports_remits,
supports_elig = EXCLUDED.supports_elig,
supports_claims_monitoring = EXCLUDED.supports_claims_monitoring,
supports_estimation = EXCLUDED.supports_estimation,
updated_at = EXCLUDED.updated_at;

This seems like a bug, no?

Anish_Srinivasan · September 10, 2024, 8:19pm

update - I tried running without the chunking and I still get the same error:

Anish_Srinivasan · September 11, 2024, 8:47am

@MiguelOrtiz - after looking through this, I believe this is a bug Sorry to push on this, but is there any chance this can be added for investigation? I have implemented several such upserts via retools, but never through functions on workflows. The workflows feature is incredibly powerful, however the bug with bulk upsert is a pain point! Thanks

MiguelOrtiz · September 11, 2024, 8:49am

Hey @Anish_Srinivasan,

You may be right on this. I'll let the retool team chip into this one. Tagging @Paulo and @Jack_T for visibility.

Darren · September 17, 2024, 12:11am

Hey @Anish_Srinivasan! Thanks for reaching out - and welcome to Retool.

I'm digging into this now and am unable to immediately replicate your issue, albeit it with a much simpler use case. This might just be a syntax issue - should _.map(waystar_payer_list, e => [{ ... }] be _.map(waystar_payer_list, e => ({ ... }) instead? The former is returning an array of arrays with a single element, whereas the latter returns an array of objects and seems like the more logical option.

Anish_Srinivasan · September 21, 2024, 8:21am

Amazing!! This was absolutely the problem - thank you so much for catching this tiny nuance @Darren

Topic		Replies	Views
Using Bulk Upsert with multiple new records 💬 Queries and Resources	30	5926	February 25, 2025
Pass Json object of parameters to stored procedure 💬 Queries and Resources mssql , query-json-with-sql	10	245	March 31, 2025
Problems with workflow loop function not accepting parameters and erroring out 💬 Workflows bug	9	219	May 22, 2025
Solving for a knex error 💬 Workflows bug , javascript , sql , workflows	7	791	August 5, 2024
Conditional logic in MongoDB update operations 💬 Queries and Resources	9	599	October 25, 2024

"duplicate key value violates unique constraint" on bulk upsert via primary key function

Related topics