Retool URL vectors - unexpected behavior with fetching URLs and inconsistent results

Hey folks,

I'm looking for some help with the Retool vectors URL scraping feature.

My use case:

  • Enter a top level URL (eg. https://amplitude.com) and index all available site data
  • Create vector embeddings of all site data
  • Run a prompt query to OpenAI API with the vector embeddings to generate answers to specific questions about the site content
  • Perform this step over a list of hundreds of URLs. I have a workflow to control sequencing and stagger job runs

The issue:

  • When I submit a top level URL, the result is often:
    • Incomplete - the fetch only finds 5-10 URLs, when I expect it to find dozens
    • Empty - no results are returned at all (even if the site allows robots.txt)
    • Hanging - fetch hangs for 15-20 mins, with no or limited results
    • Inconsistent - the vectors fetch returns very different URLs for the same top level URL entry

Can someone please help me understand the expected behavior and supported functionality of the vectors URL scrape? Am I running into some timeout or capacity constraints? No idea.

I very much want to use this feature, but information about it is limited and I can't troubleshoot this issue on my own.

Thank you!

Hi @hstan,

Thanks for reaching out! I am looking into this.

To confirm, is this happening on your Cloud-hosted Retool account?

Are you running this step (Enter a top level URL (eg. https://amplitude.com ) and index all available site data) in the Vector Resource UI? Are there browser console errors?

After discussing with my team internally, it sounds like this behavior is due to the fairly strict limitations listed our docs: Retool-managed Vectors | Retool Docs

If document vectors are an option, that could help circumvent these limitations. Otherwise, for the complexity of your use case, it could be worth looking for an alternative to the Retool Managed Vector option