How to Download All PDFs on a Webpage with a Python Script

Jul 12, 2015 06:55 PM
Jul 12, 2015 06:59 PM
635722982740915481.jpg

Well, this is my first article so if it sucks tell me...lol!!

Story Time

Well, story time....yaaay!!!

I wanted to learn buffer overflows and binary exploitation and all those asm crap...lol

So I opened up a lotta sites and eventually came across a polytechnic website with pdfs and ppts full of that. It was kind of like a syllabus with notes and all. I was ecstatic and then I figured I will start downloading all of it. But then it was like 22 pdfs and I was not in the mood to click all 22 links so I figured I will just write a python script to do that for me. It was awesome when it worked, didn't think it would...lol!! So don't believe in yourself. Believe in your code!! Just kidding!! Believe in whatever you want to. I don't care.

Step 1: Import the Modules

So this typically parses the webpage and downloads all the pdfs in it. I used BeautifulSoup but you can use mechanize or whatever you want.

635722982740915481.jpg

Step 2: Input Data

Now you enter your data like your URL(that contains the pdfs) and the download path(where the pdfs will be saved) also I added headers to make it look a bit legit...but you can add yours...it's not really necessary though. Also the BeautifulSoup is to parse the webpage for links

635722984651122466.jpg

Step 3: The Main Program

This part of the program is where it actually parses the webpage for links and checks if it has a pdf extension and then downloads it. I also added a counter so you know how many pdfs have been downloaded.

635722987018503244.jpg

Step 4: Now Just to Take Care of Exceptions

Nothing really to say here..just to make your program pretty..that is crash pretty XD XD

635722988063946291.jpg

Conclusion

Well, that's it...if you have any questions let me know...I haven't really tested the code because I wanted a clean one. But I copied it from the one that worked so it should...sorry for errors and bad English...thanks for reading to the end and looking at this...also if there are any suggestions or anything to add please let me know...and you can improve it all you want :D But I will like a little credit...I mean who won't...lol

KNOWLEDGE IS FREE!!!!

Comments

No Comments Exist

Be the first, drop a comment!