Reporter on fields

Monday, May 1, 2023

scrapy CrawlSpider instead of wget -r

I needed to download all pages from a wix-made website.

`wget -r` didn't work.

httrack , lynx didn't work, either.

https://askubuntu.com/questions/391622/download-a-whole-website-with-wget-or-other-including-all-its-downloadable-con

I could download websites with CrawlSpider and FollowLink of https://github.com/scrapy/scrapy

https://www.youtube.com/watch?v=o1g8prnkuiQ

I'll use playwrite later, too. (instead of selenium)

https://scrapeops.io/python-scrapy-playbook/scrapy-playwright/

I found some candidates

https://github.com/crawlab-team/crawlab

at 6:24 PM No comments:

Email This BlogThis!Share to Twitter Share to Facebook Share to Pinterest

Newer Posts Older Posts Home

Subscribe to: Posts (Atom)

Blog Archive

▼ 2023 (6)
- ► December (1)
- ► November (1)
- ► June (2)
- ▼ May (1)
  - scrapy CrawlSpider instead of wget -r
- ► April (1)

► 2021 (2)
- ► November (1)
- ► June (1)

► 2020 (3)
- ► November (1)
- ► October (1)
- ► September (1)

► 2014 (3)
- ► April (1)
- ► March (1)
- ► February (1)

► 2013 (6)
- ► December (4)
- ► November (2)

Simple theme. Powered by Blogger.