Forum Thread: Anyone Have Experience in Cloning Site via Python?

Anyone Have Experience in Cloning Site via Python?

I want to build some script which will include site cloner, so i wonder if anyone here have experience with that? Which libraries were used or framerwork, i need information. Cheers

2 Responses

That's actually a research project I'm working on, and you can check it out here:
https://github.com/AlexMapley/Bartimeaus/blob/master/spider.py

The series I'm writing right now is actually a build up to this point.
https://null-byte.wonderhowto.com/forum/creating-python-web-crawler-part-1-getting-sites-source-code-0175912/

Although you'd definitely have to tweak this program a little bit, it's designed to go through an entire website and archive all of it's pages. You could definitely use it to clone a website.

To run it from the terminal, run "python spider.py 'http://www.example.com 1"

What it will do is start from the website link you input as argument 1, and archive every single linked webpage with the keyword "example". It will call itself recursively 1 time, or however many times you put in argument 2, opening every link it sees from every page. It will also never open the same link twice.

If you run this on a website, it'll probably take a while (maybe an hour???) but you can definitely clone it.

Thank you, that was very helpful !

Share Your Thoughts

  • Hot
  • Active