How to Best Handle Complex Queries

I have some rather heavy queries. I have been using CTEs, but reloading data is slow.
If I wish to update data in one of the DB tables from the table component, if would then need to fetch a lot of data to reload the changes.

I've since broken the large queries into smaller ones and used mergeArray in a transformer to build the data for the table to display. The results seem a little buggy where sometimes the table will load with half the data missing from the view.

I'm looking for guidance on how best to handle such a situation. Would a js query be a better option than a transformer to run the mergeArray from?

Or is there a better way to handle this type of problem. Seems like I am meeting this issue a lot throughout our apps. I've also just started to look at shared queries to try and modulate things more, but that seems a little too complex to set up for now.

Are there places where you’re using a CTE where a window function might accomplish the same things with less work?

Any chance that you can narrow/paginate the data in the table and thus the scope of the query that needs to run on update?

Hard to give a lot of feedback here without specifics but generally doing as much of the work as you can in the query is going to be faster than using JS to transform. Let the database do the work instead of the browser.

Thanks for the reply! The query in question is here.

WITH OnOrder AS (
    SELECT pol.variant_id, SUM(pol.quantity_ordered) as on_order_sum
    FROM admin.purchase_order_line as pol 
    LEFT JOIN admin.purchase_orders as po on po.id = pol.purchase_order_id
    WHERE po.status_id = 2
    GROUP BY pol.variant_id
),
LatestInterval AS (
    SELECT 
        sli.variant_id,
        current_date - MAX(sli.timestamp) AS time_interval
    FROM 
        admin.stocktake_line_items as sli
    WHERE sli.submitted = TRUE
    GROUP BY 
        sli.variant_id
),
Received AS (
    SELECT pol.variant_id, CAST(SUM(pol.quantity_ordered) AS INTEGER) as received_sum
    FROM admin.purchase_order_line as pol 
    LEFT JOIN admin.purchase_orders as po on po.id = pol.purchase_order_id
    LEFT JOIN LatestInterval as li on li.variant_id = pol.variant_id
    WHERE (po.status_id = 3 OR po.status_id = 4)
      AND pol.updated_at >= (current_date - li.time_interval)
    GROUP BY pol.variant_id
),
Sold AS (
    SELECT ol.variant_id, CAST(COALESCE(SUM(ol.quantity), 0) AS INTEGER) as sold_sum
    FROM shopify.order_line as ol
    LEFT JOIN shopify.order as o ON ol.order_id = o.id
    LEFT JOIN LatestInterval as li on li.variant_id = ol.variant_id
    WHERE o.created_at >= (current_date - li.time_interval)
    AND o.customer_id != 6295632740446
    GROUP BY ol.variant_id
),
LatestCount AS (
    SELECT DISTINCT ON (sli.variant_id) 
        sli.variant_id, 
        CAST(COALESCE(sli.counted_qty, 0) AS INTEGER) as latest_count,
        sli.timestamp
    FROM 
        admin.stocktake_line_items as sli
    WHERE sli.submitted = TRUE
    ORDER BY 
        sli.variant_id, sli.timestamp DESC
), 
SavedCount AS (
    SELECT sli.variant_id, sli.counted_qty, sli.id as sc_id 
    FROM admin.stocktake_line_items as sli 
    WHERE sli.stocktake_id = {{ localStorage.values.stocktakeID }}
    AND sli.submitted = false
),
BinLocationAggregate AS (
    SELECT 
        variant_id, id as bin_id, 
        ARRAY_AGG(location) AS locations,
        ARRAY_AGG(counter) AS counters
    FROM 
        admin.bin_location
    GROUP BY 
        variant_id, bin_id
)
SELECT 
    DISTINCT ON(pv.id) 
    pv.id as variant_id, 
    pv.sku, 
    CONCAT(p.title, ' / ' , pv.title) as product, 
    CAST(il.available AS NUMERIC) as soh,
    il.location_id, 
    il.inventory_item_id,
    COALESCE(pii.src, pi.src) as image,
    oo.on_order_sum,
    r.received_sum,
    s.sold_sum,
    lc.latest_count,
    lc.timestamp as latest_count_timestamp, 
    sc.counted_qty as current_count, 
    bla.locations,
    cv.counter, 
    sc.sc_id, 
    bla.bin_id

FROM 
    shopify.product_variant as pv 
LEFT JOIN shopify.product as p ON p.id = pv.product_id
LEFT JOIN shopify.inventory_level as il ON il.inventory_item_id = pv.inventory_item_id AND il.location_id = 4452229
LEFT JOIN shopify.product_tag as pt ON p.id = pt.product_id
LEFT JOIN shopify.product_image as pi ON pi.product_id = p.id AND pi.position = 1
LEFT JOIN shopify.product_image as pii ON pv.image_id = pii.id
LEFT JOIN OnOrder as oo ON pv.id = oo.variant_id
LEFT JOIN Received as r ON pv.id = r.variant_id
LEFT JOIN Sold as s ON pv.id = s.variant_id
LEFT JOIN LatestCount as lc ON pv.id = lc.variant_id
LEFT JOIN SavedCount as sc on pv.id = sc.variant_id
LEFT JOIN BinLocationAggregate as bla on bla.variant_id = pv.id
  LEFT JOIN admin.counter_variant as cv on cv.variant_id = pv.id 
WHERE 
    pt.value = 'manufacturedparts'
ORDER BY 
    pv.id;

I'm not sure if this is excessive or standard as I'm not that experienced with coding or writing queries and have arrived at this point with trial and error and a lot of help from ChatGPT. This method works, but it does seem like a lot of data to be fetching for every updated value.

I looked into the WINDOW function but it doesn't seem completely applicable to this case, as far as I can see.

The point about letting the DB do the work over the browser is interesting though. Thanks for the info.

1 Like

Just a quick look here obviously but I think you've got some places where you can move CTEs into the main query and do your summing via a case statement

WITH OnOrder AS (
    SELECT pol.variant_id, SUM(pol.quantity_ordered) as on_order_sum
    FROM admin.purchase_order_line as pol 
    LEFT JOIN admin.purchase_orders as po on po.id = pol.purchase_order_id
    WHERE po.status_id = 2
    GROUP BY pol.variant_id
)

Instead of this you can get the same result by left joining the two tables directly onto product_variant and instead of selecting from the CTE, use

sum(case when po.status_id = 2 then pol.quantity_ordered else 0 end) as on_order_sum

So the left join will create rows for POs of all statuses where the join criteria is met and rather than having to narrow it to just 2 in the where clause, you just ignore the value in your sum when the status doesn't match.

The same pattern would seem to work for a few of the CTEs, which would eliminate some of the duplicative joining.

Be sure to use the EXPLAIN function for whatever DB you're on as well, otherwise there's too much variability in the request/response to really get a good read on performance just from Retool response times.

Also worth reiterating, consider paginating and filtering your table data if that query result is considerably bigger than what the UI can handle anyway.

1 Like