Asking a Moderator: Will My Web-Crawler Harm Your Site? I Don't Want to Cause a DOS Fiasco.

Asking a Moderator: Will My Web-Crawler Harm Your Site? I Don't Want to Cause a DOS Fiasco.

Hey guys, so I've been working on a now 3 part tutorial here:

https://null-byte.wonderhowto.com/forum/creating-python-web-crawler-part-1-getting-sites-source-code-0175912/
https://null-byte.wonderhowto.com/forum/creating-python-web-crawler-part-2-traveling-new-sites-0175928/
https://null-byte.wonderhowto.com/forum/creating-python-web-crawler-part-3-narrowing-our-search-scope-0175935/

In it I detail the creation of a recursively defined python web-crawler, that can navigate through sites and process their raw-html docs. The ultimate goal of the crawler series is to create a tool to clone/archive and entire website, visiting over 500+ pages a minute and pulling all their data.

In the tutorials I use null-byte as an example target, because I consider it safe (no system admin would be too freaked out, people run scans against this site all the time as part of other tutorials). However, I'm worried that I could be potentially unleashing a DOS attack on the hosting servers, visiting so many links at once from one IP address.

I don't really have the best understanding of how DOS attacks work, so I'm just wondering if what I'm doing here is dangerous for the site. Null-Byte is awesome and I'd never want to risk messing with it.

Please let me know, I don't want to cause any harm.
Sharknado

4 Responses

A DOS attack is simply denying service to something. It is commonly done by sending repetitive requests to a server or client and causing it to crash. A web-crawler should not cause Denial of Service.

That's what I was afraid of, sending repetitive requests for individual pages to the hosting server. However, I suppose by what you're saying a web crawler would be distributed and slow enough not to cause any real issues.

Also, is it likely a web-admin/security-feature would flag my IP address for conducting a scan like this? the scan being me cloning their website by crawling through, not actually probing their hosting OS.

I guess it might do that, I know that an SH file has a sleep command to delay commands, I'm not sure about PY though. The site MAY flag this, but it might not. The best way I could think of to prevent DoS is to delay commands or not target this site (you could use your own, or a larger one {which may be risky})

There actually is a python sleep() command, that's a really good idea. For my own project I think I'll add a -sS flag like it netmap implementing just that, thanks bud that will help a lot

Share Your Thoughts

  • Hot
  • Active