I am looking for the best way to bring in data from a website that does not have a API.
I have been able to find URLs that give the data I need in a json format and have them saved in a table in my database.
I am struggling to figure out the best ways to parse, and save the data into my database.
there are tons of ways to scrape a website.
More requirements would be helpful to narrow down the possibilities.
Do you need a one-off scraping and import the data manually?
In this case, there are chrome extensions (Bardeen is good) that allow to configure visually a scraper and save a CSV out of it.
Or do you need an harvester to perform more automatic scraping activities?
Scrapingbee or BrowseAI offer remote browsers you can program.
The website I am working with stores a large amount of data in json and has accessible .json urls. I have never needed a plugin in the past to scrape this sort of data but I am not experienced with any of this by any means.
For saving the response, you'll need to connect a resource to save the data. If you're familiar with SQL & don't already have an API or database in mind for this use case, you might consider using Retool's Database feature.
Since you've already found the URLs with the JSON data you need, you're off to a great start. For parsing and saving the data into your database, you could use a combination of Python and libraries like Requests to fetch the data and Pandas to parse and structure it. Here’s a quick example:
Fetch the data using requests:
import requests
response = requests.get('your_json_url')
data = response.json()
Parse and structure the data using Pandas:
import pandas as pd
df = pd.DataFrame(data)
Save the data into your database (assuming you're using SQL): You can use a library like SQLAlchemy to connect to your database and save the data:
from sqlalchemy import create_engine
engine = create_engine('your_database_connection_string')
df.to_sql('your_table_name', engine, if_exists='replace')
If you're doing a lot of scraping without an API, you may also run into rate limits or get blocked. In that case, using a tool like Multilogin could help you stay undetected while gathering data. It’s great for rotating IPs and creating unique browser profiles. You can check it out here.