How to Download All PDFs on a Webpage with a Python Script
Well, this is my first article so if it sucks tell me...lol!!
Well, story time....yaaay!!!
I wanted to learn buffer overflows and binary exploitation and all those asm crap...lol
So I opened up a lotta sites and eventually came across a polytechnic website with pdfs and ppts full of that. It was kind of like a syllabus with notes and all. I was ecstatic and then I figured I will start downloading all of it. But then it was like 22 pdfs and I was not in the mood to click all 22 links so I figured I will just write a python script to do that for me. It was awesome when it worked, didn't think it would...lol!! So don't believe in yourself. Believe in your code!! Just kidding!! Believe in whatever you want to. I don't care.
So this typically parses the webpage and downloads all the pdfs in it. I used BeautifulSoup but you can use mechanize or whatever you want.
Now you enter your data like your URL(that contains the pdfs) and the download path(where the pdfs will be saved) also I added headers to make it look a bit legit...but you can add yours...it's not really necessary though. Also the BeautifulSoup is to parse the webpage for links
This part of the program is where it actually parses the webpage for links and checks if it has a pdf extension and then downloads it. I also added a counter so you know how many pdfs have been downloaded.
Nothing really to say here..just to make your program pretty..that is crash pretty XD XD
Well, that's it...if you have any questions let me know...I haven't really tested the code because I wanted a clean one. But I copied it from the one that worked so it should...sorry for errors and bad English...thanks for reading to the end and looking at this...also if there are any suggestions or anything to add please let me know...and you can improve it all you want :D But I will like a little credit...I mean who won't...lol
KNOWLEDGE IS FREE!!!!