Vector crawling not working from url

Hello,
so I am exploring Retool AI on Retool on premise.
I create a Vector from a URL, like:

Immagine 2023-10-16 154839

Then insert an https URL, e.g. https://thenodeinstitute.org and click 'Fetch':

I then get this:

...and after few seconds, this:

Any clue on why this?

Thank you!

I have been using the new retool AI and would like to add a vector DB of data from a URL. It adds the URL fine then never finishes. The firewall of the machine doesn't detect any https traffic either.

I've looked in the docker compose logs and I can see it starts the job but no other logs / errors.

Am I missing anything? Retool has external access to the internet etc. I've tried different URLs.

Same as Vector from URL: pages crawling not working

1 Like

@Glen_Baars do you know if Retool Workflow is needed for the crawler to work?

I'm not sure, I have workflows installed and working.

The only difference is that mine never moves from "We're crawling the URL's"

maybe @francis12 could help?

Hello @Glen_Baars and @theidleguest,

We're currently aware of an issue associated with on-premise URL crawler that causes vector creation (only URL-based) to fail. We are working on a fix and hope to get a new on-premise version pushed as soon as possible! I will keep this thread updated.

Hey @francis12,

this error seems also to exist with the cloud version URL crawler. Any updates on the fix?

Thank you!

@jpmin Just checked cloud and it still seems to be working. Was their a particular url that was giving you issues?

@joeBumbaca Thanks for the reply! It is this one:

(Issue is with all URLs from the S3 Bucket)

I let users upload PDFs to an S3 Bucket and then want to add those URLs to a vector.

Hey @joeBumbaca @francis12 the issue it still present in the on-premise version. Any updates?
Thanks!

@theidleguest No updates yet, will ping this thread as soon as we do. Thanks :grinning:

Hello @joeBumbaca, there's an error being thrown while trying to fetch the URL
Immagine 2023-11-28 102109

Hey @joeBumbaca @francis12 , the issue still exist in the on-premise retool, any updates? thank you.

@Huayu_Qin No updates right now, this is still in our queue of work to do. We'll update this topic when there is any additional information to share.

Is this still not fixed? Or having issues?

I just tried with ~10 different urls and got the No URL Found error as well each time. I had tested before and it seemed to work, but not sure what's happening now.

Hey @shawnhoffman, this is still in the queue to get fixed on our end. As far as I know this has not worked for self-hosted instances but works on the cloud product. I'll update this topic when there is any further information to share.

Interesting - ours is cloud. Seems to be hit or miss. I've tried it and had it work, then other times with the same url, nothing. I'll keep playing around and check back for fixes/updates.