#how-do-I Aggregate, Transform and Load monthly data from CSVs and XLSX files?

Little_Big_Apple · August 4, 2023, 8:28pm

#how-do-I
I'm working with various financial SaaS tools that provide CSV and XLSX files for tabular movement reports. These files are stored both on my hard drive (when extracted) and on Google Drive (where I save them).

My objective is to aggregate this data into a single location so I can extract valuable financial insights, reports, and KPIs. To accomplish this, I need to execute an ETL (Extract, Transform, Load) process, and I'm seeking best practices.

Here are two strategies I'm considering:

Convert CSV and XLSX into Google Sheets Format: I can create a "Master Sheet" that compiles information from the monthly reports for each SaaS tool and sanitizes the data. Once this is done, I can connect Retool to the master sheet to import the data. However, I'm undecided on whether the data should be sent in a push-based manner or retrieved in a pull-based method.
Use Retool Workflows for Processing: Another approach is to employ Retool Workflows to eventually iterate over the monthly CSV/XLSX files in Google Drive (not sure if possible). This would allow me to pull the data, aggregate it, transform and sanitize it, and then load it into the database.

I'm quite new to this, and I'm confident that there may be even more efficient solutions out there. Your guidance on the proper path would be immensely valuable.

Thank you for your assistance.

P.S. Retool is amazing! It's playing a crucial role in my project, and I'm excited to learn more from this community.

bradlymathews · August 5, 2023, 4:34am

Another idea would be to convert your files to CSV and use one of the File Uploaders like File Drop Zone.

Enable the Parse Files option and then when you drop the file, it automatically parses the CSV into JSON.

You can transform it as needed and use a bulk insert query to add it to your database. If you used the Drop Zone's parse event, you can run a JS query that does all of your heavy lifting.

So your workflow is just Drop, Drop, Drop... to get them in.

If you would like you could add a visual qual check be adding a table and setting its source either to your uploaded CSV/JSON or to your transformed version before hitting an Upload button.

Little_Big_Apple · August 5, 2023, 4:04pm

This is a good suggest @bradlymathews .
Thank you. I have thought of something similar in the past. The only reason I discarded this approach was due to the higher manual work it requires because we are already dropping the files in GDrive due to other internal processes.

But I will certainly think about it. Is there any Demo, tutorial, reference or forum discussion that might give me some more insight's into how to achieve this approach?

bradlymathews · August 6, 2023, 12:36am

Efficiently upserting CSV data without creating duplicates has some data to tease out.

Also this Video shows how the parsed data will be created. But the method he is using to upload the data to Firestore might not be applicable to you if you are using SQL because doing the bulk update to SQL is far easier. This is a documentation version of that video: Insert CSV data into a database | Retool Docs.

This is an example of uploading direct from CSV, but without transforming first: Bulk Insert query in postgres not inserting data SOLVED

Hopefully those help.

Little_Big_Apple · August 28, 2023, 10:28am

This is great. Thank you so much @bradlymathews