Crawl to vector -feature. Does it work work for you?

Kimi · November 4, 2024, 6:42am

Hi All,

vector crawler (cloud version) questions:

does it work for you?
it most definitely does not go "1000 pages deep" --> it seems there is aprox. 50 pages limit, after which the crawler stops.
does the excluding work only for sub-domain? not for sub folders or specific urls?
is the only way to crawl consistently to use third-party api and workflow (with crawl4ai or firecrawl etc)?

image539×491 20.3 KB

Darren · November 11, 2024, 11:03pm

Hi @Kimi! Thanks for reaching out.

In my relatively limited experience with this feature, I haven't historically seen the issues that you're describing. That said, I tried it out this morning and the behavior does seem to have changed a bit.

I reached out to the team and got some clarification - essentially, these published parameters were having an outsized impact upon our cloud infrastructure and performance and have become more restrictive, as a result. The new limit is 25 total pages, with a 15 second timeout per page. The banner in the UI will be updated to reflect these changes in the next cloud release!

On the other hand, there haven't been any changes to the way certain pages are excluded, meaning Retool does simple string matching in order to determine whether a given page should be vectorized or not. Any page whose URL starts with the specified string will be excluded.

Given these restrictions and the specifics of your use case, it may or may not make sense to instead utilize a third-party API. I'm happy to answer any additional questions you might have in order to find a solution for your particular use case!

Topic		Replies	Views
Vector url crawler 💬 Queries and Resources ai	4	485	May 22, 2024
Retool URL vectors - unexpected behavior with fetching URLs and inconsistent results 💬 App Building ai , welcoming-replies	3	166	October 17, 2024
Vector crawling not working from url 💬 Self Hosted Retool bug , openai	22	1319	October 17, 2025
Categorically disable URL crawl for vectors - only vectorize specific URLs 💬 Feature Requests	1	632	December 8, 2023
Possible to disable URL crawl for vectors? 💬 Feature Requests openai , new-feature	1	502	December 8, 2023

Crawl to vector -feature. Does it work work for you?

Related topics