Hey folks,
I'm looking for some help with the Retool vectors URL scraping feature.
My use case:
- Enter a top level URL (eg. https://amplitude.com) and index all available site data
- Create vector embeddings of all site data
- Run a prompt query to OpenAI API with the vector embeddings to generate answers to specific questions about the site content
- Perform this step over a list of hundreds of URLs. I have a workflow to control sequencing and stagger job runs
The issue:
- When I submit a top level URL, the result is often:
- Incomplete - the fetch only finds 5-10 URLs, when I expect it to find dozens
- Empty - no results are returned at all (even if the site allows robots.txt)
- Hanging - fetch hangs for 15-20 mins, with no or limited results
- Inconsistent - the vectors fetch returns very different URLs for the same top level URL entry
Can someone please help me understand the expected behavior and supported functionality of the vectors URL scrape? Am I running into some timeout or capacity constraints? No idea.
I very much want to use this feature, but information about it is limited and I can't troubleshoot this issue on my own.
Thank you!