Vector url crawler

hi ya'll

i did not find any documentation on vector url crawler:

  • user agent?
  • IPs?
  • crawl speed / throttling?
  • how to adjust crawl "deepness" from 1000 to let's say 5 levels, or 0 to crawl only a list of pages etc.

i'm asking since i don't want to release the crawler on a huge site without knowing the crawler a bit better. also, i want to know for which crawler to punch a hole in our WAF.

thanks in advance

-kimi-

Hi :slight_smile:

@victoria Can you point me in the right direction with this?

-Kimi-

Hello @Kimi!

Unfortunately Victoria has left the Retool team to adventure around the world.

In the meantime, one of our engineers who specializes in AI is preparing a full formal response with greater detail for Vectors!

Stay tuned :grin:

1 Like

Hey @Kimi, engineer at Retool here

  • user agent?

We use a simple http/https proxy

  • IPs?

It's the Retool IP address Allow Retool to access data sources | Retool Docs

  • crawl speed / throttling?

we use this library called p-queue under the hood, configuration below
const MAX_REQUESTS = 10
const MAX_VISITED_URLS = 250
const QUEUE_TIMEOUT = 15 * 1000 // 15s

  • how to adjust crawl "deepness" from 1000 to let's say 5 levels, or 0 to crawl only a list of pages etc.

you can't adjust right now, we're trying to keep the load low so that the server does not crash

1 Like