Hi, I have table emailDuplicatesTable in Retool that shows lead duplicates in the database.
The source of this table is BigQuery.
I try to create GUI Query that add specific values in the source BigQuery table when user clicks button "Delete duplicates".
So, I need that columns manually_deleted, reason_for_deletion and deleted_by will be updated and the following values will be added:
Yes, duplicates are shown in the table component.
As mentioned I only want to update records for those rows which value in column "rn" is not equal to 1.
I've tried to add my code to js query and then use data of it as an array of records to update, but still get timeout error.
Do I need to update my code somehow?
I think the Retool team will be able to help you here as I cannot determine the issue based on the information posted. I can only assume there is an issue with the amount of data you are updating or a concurrency issue but I cannot say why. @Tess@Kabirdas may be able to assist here.
if you look closer at your JS query, you will notice that you update all rows, including where rn = 1, to itself. Also JS mapping twice is not efficient (map=table scan) and not necessary
GUI query by PK is (more) meant to update rows which a user changed. A few rows here and there can be updated using the PK as predicate to target the updates (usually you would have the PK field matching a clustered by field in BQ)
The BigQuery logs would have told you what exact SQL is being generated by Retool. Here one update statement per row.
the last error message is probably from BigQuery as you have hit the concurrency quota (updates in parallel)
Long story short, the generated SQLs makes BigQuery processing the updates atomically. Not only inefficient but also costly if you read how BQ is priced (each update is a table copy).
Use standard SQL instead to push the heavy lifting to the DBMS which is way more efficient, cost-effective, ... and straightforward:
UPDATE marts.leads_for_retool
SET manually_daleted (deleted?) = CURRENT_TIMESTAMP(),
reason_for_deletion = 'email_duplicate',
deleted_by = {{ current_user.firstName }}
WHERE rn != 1
Lastly, instead of updating duplicates and filtering them out (I assume), you can return only one record per lead based on your criteria e.g. if only the first created_on:
SELECT * FROM dataset.table
QUALIFY ROW_NUMBER() OVER (PARTITION BY main_email (whatever identifies a lead uniquely) ORDER BY created_by) = 1
Thanks @yiga2! Using standard sql query really sounds like the most suitable solution here. However, how to make the following query run on BigQuery so that it affects marts.leads_for_retool table?
Not sure I understand. In Retool, you just switch the bulk update to SQL mode and use the SQL I gave you. And you add a success handler to retrigger the table source.
Retool will re-read the BQ table with the updates and refresh the Table component.
Sorry, I'm not an expert on Retool. Yes I updated my SQL query and now it works. But when I take all my data it still returns an error:
"Query timed out after 120000ms"
My SQL query that I use:
UPDATE `horizo....marts.leads_for_retool`
SET
deleted_by = {{ current_user.firstName }}
WHERE id in (SELECT a.id
FROM `horizo.....marts.leads_for_retool` a
LEFT JOIN (
SELECT idd as id, rn as rn
FROM UNNEST({{emailDuplicates.data.id}}) AS idd WITH OFFSET idx
JOIN UNNEST({{emailDuplicates.data.rn}}) AS rn WITH OFFSET ridx
ON idx = ridx
) b
ON a.id = b.id
WHERE b.id IS NOT NULL and b.rn <> 1
)
You are over-engineering things as Retool is not involved in the update (except to read the current_user) as per the SQL I provided. The latter can just as well be run in BQ console as there is no need to read the table data.
Sorry I cannot help any further.