Vector crawling not working from url

Hello,
so I am exploring Retool AI on Retool on premise.
I create a Vector from a URL, like:

Immagine 2023-10-16 154839

Then insert an https URL, e.g. https://thenodeinstitute.org and click 'Fetch':

I then get this:

...and after few seconds, this:

Any clue on why this?

Thank you!

I have been using the new retool AI and would like to add a vector DB of data from a URL. It adds the URL fine then never finishes. The firewall of the machine doesn't detect any https traffic either.

I've looked in the docker compose logs and I can see it starts the job but no other logs / errors.

Am I missing anything? Retool has external access to the internet etc. I've tried different URLs.

Same as Vector from URL: pages crawling not working

1 Like

@Glen_Baars do you know if Retool Workflow is needed for the crawler to work?

I'm not sure, I have workflows installed and working.

The only difference is that mine never moves from "We're crawling the URL's"

maybe @francis12 could help?

Hello @Glen_Baars and @theidleguest,

We're currently aware of an issue associated with on-premise URL crawler that causes vector creation (only URL-based) to fail. We are working on a fix and hope to get a new on-premise version pushed as soon as possible! I will keep this thread updated.

Hey @francis12,

this error seems also to exist with the cloud version URL crawler. Any updates on the fix?

Thank you!

@jpmin Just checked cloud and it still seems to be working. Was their a particular url that was giving you issues?

@joeBumbaca Thanks for the reply! It is this one:

(Issue is with all URLs from the S3 Bucket)

I let users upload PDFs to an S3 Bucket and then want to add those URLs to a vector.

Hey @joeBumbaca @francis12 the issue it still present in the on-premise version. Any updates?
Thanks!

@theidleguest No updates yet, will ping this thread as soon as we do. Thanks :grinning:

Hello @joeBumbaca, there's an error being thrown while trying to fetch the URL
Immagine 2023-11-28 102109