Reporter on fields: scrapy CrawlSpider instead of wget -r

Monday, May 1, 2023

scrapy CrawlSpider instead of wget -r

I needed to download all pages from a wix-made website.

`wget -r` didn't work.

httrack , lynx didn't work, either.

https://askubuntu.com/questions/391622/download-a-whole-website-with-wget-or-other-including-all-its-downloadable-con

I could download websites with CrawlSpider and FollowLink of https://github.com/scrapy/scrapy

https://www.youtube.com/watch?v=o1g8prnkuiQ

I'll use playwrite later, too. (instead of selenium)

https://scrapeops.io/python-scrapy-playbook/scrapy-playwright/

I found some candidates

https://github.com/crawlab-team/crawlab

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)