You've probably seen those deep-web images floating around on the Internet. Usually, it goes something like this: There is a towering iceberg and the deeper the underwater portion extends, the more "hidden" and "exotic" the content is described to be. Sometimes these images are accurate to a point, but most are just making things up.
So what exactly is 'deep web' then? Are there really hidden secrets and treasure buried under some cloak and dagger type conspiracy? Well, in short, the answer depends on your idea of treasure and conspiracy.
In this series of articles, I am going to break down the idea of a deep web, what is it, how it got there, and most importantly, how we can use it for our security—maybe even for lulz.
As it stands today, practical knowledge of how darknets (non-indexed portions of the Internet) function will allow you to make more informed decisions regarding risks when rooting a box, hiding files, or communicating securely. Now, keep in mind that nothing short of unplugging your computer will make you 100% anonymous. You can have fifty proxies and a handful of VPNs, but never consider yourself to be completely masked. Look at anonymity as a trade off between function and speed.
The idea is to have enough masking, while maintaining a level of usefulness. While it is not impossible to track your actions, ideally you want to make it logistically too complex to be attempted in a realistic way.
If you need to escape your school's firewall, a simple HTTP proxy may work for this. If you are rooting an AT&T server, you will want several layers between yourself and the target. Cyber crime laws are changing rapidly around the world, from Cairo to Chicago, and learning how to blend in with the masses is a valuable skill to have. You are harder to track when you are nobody at all.
First though, we need to talk about Google. Yes Google.
When you use Google to search, it does not take your query and search the entire Internet for results; there is simply too much data for that. It runs your search on databases of sites that have already been located by Google's web crawlers. These crawlers are bots, coded to search for, find, and index content on the web. Primarily, this is achieved by 'seeding' the crawler with a few initial links to start with. It scans for more hyperlinks on those websites, connecting to and repeating this process over and over while creating a 'map' of the results.
It is this 'map' of collected links that your search request is actually looking at. While this is innovative, it contains a few inherent flaws.
The Internet is large. In fact, the Internet is very large, and estimates on just how much of it is actually indexed and publicly searchable range from 40 to 70 percent. Problems arise in the fact that most search engines do not crawl through non-HTTP protocols like Gopher or FTP. And if they choose to, developers can take steps to minimize indexing of their sites altogether (controls like the robots.txt standard and spider traps are commonly used). It is worth noting that network resources requiring authorization are not crawled, under normal circumstances.
The surface web makes up over 90 percent of what you use and do online. The remaining network services require you to directly connect to them, log in to them, or otherwise know they are there beforehand.
So, what good is all of this for you?
Right now, not as much as you might think. Though the content might not be searched, and is sometimes exciting and risque, you are still held to the basic laws of the Internet, TCP/IP.
Every packet you send that zips back and forth has your IP address inside. It has to have your IP address, or the remote server would not know where to route the requests back to. This means that even if you are snooping around where you shouldn't be, even if it's not on an indexed site, those server logs can still give you up, even when a normal HTTP/SOCKS proxy is used. When your door gets kicked open and the Feds storm your living room, you will have wished you took the time to truly hide yourself.
Picture the Internet like a city, with each building as a resource with an address. Envision the non-indexed parts as alleyways, still connected to the main streets, but lacking public addresses of their own.
Let's take it one step farther... what if these alleyways had gates? What if you could create a path through the city using just the alley and your own private keys? You would be much harder to locate and follow. You could always pop out into the city, go into a house as you needed; people would only see you come and go from the alley, would not know where you started nor where you intend to go.
This is the basic idea behind low-latency anonymous networks like Tor and i2p. We will go over both of these in more detail, including installation and configuration, in the upcoming articles, so stay tuned.