"duplicate key value violates unique constraint" on bulk upsert via primary key function

Hi! I am on the newer side to Retool, so apologies if I am making a silly mistake here :slight_smile: I am attempting to bulk upsert data from a code block (I am chunking this data into 500 rows because there are ~6000 rows I am attempting upsert on average) into a retool db. This works perfectly fine on the first go, when my retool db is completely empty, but when I attempt to re-run to test for updating, I get the following error:

I see that the error shows the query:

insert into "waystar_payer_list" ("accepts_dual_clearing_houses", "payer_id", "payer_name", "related_payer_name", "requires_enrollment_claims_monitoring", "requires_enrollment_elig", "requires_enrollment_estimation", "requires_enrollment_inst_claims", "requires_enrollment_prof_claims", "requires_enrollment_remits", "secondary_claim_format", "supports_claims", "supports_claims_attachments", "supports_claims_monitoring", "supports_elig", "supports_estimation", "supports_remits", "unique_id", "updated_at") values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)

but I expected something along the lines of this:

INSERT INTO waystar_payer_list (
unique_id, payer_id, payer_name, related_payer_name, requires_enrollment_prof_claims, requires_enrollment_inst_claims,
secondary_claim_format, requires_enrollment_remits, requires_enrollment_elig, requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses, supports_claims_attachments, requires_enrollment_estimation, supports_claims,
supports_remits, supports_elig, supports_claims_monitoring, supports_estimation, updated_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)
ON CONFLICT (unique_id)
DO UPDATE SET
payer_id = EXCLUDED.payer_id,
payer_name = EXCLUDED.payer_name,
related_payer_name = EXCLUDED.related_payer_name,
requires_enrollment_prof_claims = EXCLUDED.requires_enrollment_prof_claims,
requires_enrollment_inst_claims = EXCLUDED.requires_enrollment_inst_claims,
secondary_claim_format = EXCLUDED.secondary_claim_format,
requires_enrollment_remits = EXCLUDED.requires_enrollment_remits,
requires_enrollment_elig = EXCLUDED.requires_enrollment_elig,
requires_enrollment_claims_monitoring = EXCLUDED.requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses = EXCLUDED.accepts_dual_clearing_houses,
supports_claims_attachments = EXCLUDED.supports_claims_attachments,
requires_enrollment_estimation = EXCLUDED.requires_enrollment_estimation,
supports_claims = EXCLUDED.supports_claims,
supports_remits = EXCLUDED.supports_remits,
supports_elig = EXCLUDED.supports_elig,
supports_claims_monitoring = EXCLUDED.supports_claims_monitoring,
supports_estimation = EXCLUDED.supports_estimation,
updated_at = EXCLUDED.updated_at;

Am I using this upsert functionality incorrectly? Happy to provide more context if needed!

1 Like

for more context - my function, along with the code being executed:

Hey there @Anish_Srinivasan,

The error specifically says:

duplicate key value violates unique constraint "waystar_payer_list_pkey"

This error indicates that the unique_id you're generating is conflicting with an existing key in the database. A few potential reasons for this error include:

Improper generation of unique_id: The unique_id is being generated by replacing spaces or special characters in the payer_name and concatenating it with the payer_id. Any chance the resulting value is not unique across the entire dataset?

Potential duplicates: If the source data (waystar_payer_list) contains records with the same payer_name and payer_id combination, after applying the transformation, multiple records could be trying to insert with the same unique_id, causing duplication.

This would explain why the first run works well, but not the second.

Hey Miguel! Thank you for quick response here - I can confirm that the regex replace isn’t impacting anything and my unique_id being passed in is indeed unique!

Also from my understanding the upsert should update any pk’s already existing and add any net new, correct? I don’t understand why there would be any duplication in pk

I can confirm that the regex replace isn’t impacting anything and my unique_id being passed in is indeed unique!

Ok cool.

the upsert should update any pk’s already existing and add any net new, correct? I don’t understand why there would be any duplication in pk

Yes, this was my first thought as well. One idea, but it is not applicable based on your confirmation above, is that one of your chunks might be producing non-unique IDs for some records before the upsert happens. This could lead to a scenario where two records are generating the same unique_id but are trying to be inserted as new rows instead of triggering an update. But we can exclude that.

Another thought is, could it be that the issue is related to the timing of how each chunk is processed, which may cause conflicts if the database isn't fully aware of the latest state after processing each chunk? I threw this quickly to chatgpt and the suggestion came across like this:

// Function to process chunks sequentially
async function processChunksSequentially(chunks) {
    for (const chunk of chunks) {
        try {
            // Generate unique_id for each record in the chunk
            const chunkWithIds = chunk.map(e => ({
                ...e,
                'unique_id': e.payer_name.replace(/\s+/g, '_') + '_' + e.payer_id,
            }));

            // Perform upsert operation for the current chunk
            await dbConnection('waystar_payer_list')
                .insert(chunkWithIds)
                .onConflict('unique_id')  // When 'unique_id' exists, update the record
                .merge();  // Merge the existing record with the new values

            console.log('Chunk processed successfully');
        } catch (error) {
            console.log('Error processing chunk:', error);
            throw error;  // Rethrow the error to ensure it gets handled appropriately
        }
    }

    console.log('All chunks processed successfully');
}

// Assuming waystar_payer_list_array is your original data array

// Split the data into chunks
const chunks = chunkArray(waystar_payer_list_array, 500);

// Process the chunks sequentially
await processChunksSequentially(chunks);

Hope this helps identify the root cause!

I'm happy to try this, but from my understanding, if the chunking worked fine during the initial data insertion but is now causing issues when upserting (i.e., updating existing records), it’s likely not the chunking itself that's the problem. I am breaking my dataframe up in to 500 rows, each with unique data. I feel like issue lies with how the upsert logic is handling updates to existing rows!

Hi Miguel! Any other ideas you have as to why I can't get this process to work? I would try your method except I don't have access to our retool DB credentials and I can't set up a dbConnection :frowning:

Hey @Anish_Srinivasan,

Sorry, no other ideas come to mind.

Have you tried inserting all of your data in one go for testing purposes and to see if you receive the same error??

I can try this, but I think the main issue here is that my error logs clearly show the query being run as:

insert into \"waystar_payer_list\" (\"accepts_dual_clearing_houses\", \"payer_id\", \"payer_name\", \"related_payer_name\", \"requires_enrollment_claims_monitoring\", \"requires_enrollment_elig\", \"requires_enrollment_estimation\", \"requires_enrollment_inst_claims\", \"requires_enrollment_prof_claims\", \"requires_enrollment_remits\", \"secondary_claim_format\", \"supports_claims\", \"supports_claims_attachments\", \"supports_claims_monitoring\", \"supports_elig\", \"supports_estimation\", \"supports_remits\", \"unique_id\", \"updated_at\") values ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)

when in reality the query should be something more like this:

INSERT INTO waystar_payer_list (
unique_id, payer_id, payer_name, related_payer_name, requires_enrollment_prof_claims, requires_enrollment_inst_claims,
secondary_claim_format, requires_enrollment_remits, requires_enrollment_elig, requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses, supports_claims_attachments, requires_enrollment_estimation, supports_claims,
supports_remits, supports_elig, supports_claims_monitoring, supports_estimation, updated_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19)
ON CONFLICT (unique_id)
DO UPDATE SET
payer_id = EXCLUDED.payer_id,
payer_name = EXCLUDED.payer_name,
related_payer_name = EXCLUDED.related_payer_name,
requires_enrollment_prof_claims = EXCLUDED.requires_enrollment_prof_claims,
requires_enrollment_inst_claims = EXCLUDED.requires_enrollment_inst_claims,
secondary_claim_format = EXCLUDED.secondary_claim_format,
requires_enrollment_remits = EXCLUDED.requires_enrollment_remits,
requires_enrollment_elig = EXCLUDED.requires_enrollment_elig,
requires_enrollment_claims_monitoring = EXCLUDED.requires_enrollment_claims_monitoring,
accepts_dual_clearing_houses = EXCLUDED.accepts_dual_clearing_houses,
supports_claims_attachments = EXCLUDED.supports_claims_attachments,
requires_enrollment_estimation = EXCLUDED.requires_enrollment_estimation,
supports_claims = EXCLUDED.supports_claims,
supports_remits = EXCLUDED.supports_remits,
supports_elig = EXCLUDED.supports_elig,
supports_claims_monitoring = EXCLUDED.supports_claims_monitoring,
supports_estimation = EXCLUDED.supports_estimation,
updated_at = EXCLUDED.updated_at;

This seems like a bug, no?

update - I tried running without the chunking and I still get the same error:

@MiguelOrtiz - after looking through this, I believe this is a bug :slight_smile: Sorry to push on this, but is there any chance this can be added for investigation? I have implemented several such upserts via retools, but never through functions on workflows. The workflows feature is incredibly powerful, however the bug with bulk upsert is a pain point! Thanks

1 Like

Hey @Anish_Srinivasan,

You may be right on this. I'll let the retool team chip into this one. Tagging @Paulo and @Jack_T for visibility.

1 Like

Hey @Anish_Srinivasan! Thanks for reaching out - and welcome to Retool. :slight_smile:

I'm digging into this now and am unable to immediately replicate your issue, albeit it with a much simpler use case. This might just be a syntax issue - should _.map(waystar_payer_list, e => [{ ... }] be _.map(waystar_payer_list, e => ({ ... }) instead? The former is returning an array of arrays with a single element, whereas the latter returns an array of objects and seems like the more logical option.

2 Likes

Amazing!! This was absolutely the problem - thank you so much for catching this tiny nuance @Darren :smiling_face:

2 Likes