How To: Use Leaked Password Databases to Create Brute-Force Wordlists

Use Leaked Password Databases to Create Brute-Force Wordlists

To name just a few companies, VK, µTorrent, and ClixSense all suffered major data breaches at some point in the past. The leaked password databases from those and other online sites can be used to better understand how human-passwords are created and increase a hacker's success when performing brute-force attacks.

In other articles, we will cover actually generating wordlists for use in password-cracking, but in this guide, we'll learn how to create wordlists of statistical complexity and length based on actual passwords found in database leaks which occurred in recent years. Understanding how average, every day people think about passwords will aid hackers during password-guessing attacks and greatly increase the statical probability of the success of the brute-force attacks.

Disclaimer

The leaked databases featured in this article were obtained using public and darknet resources. The databases are all at least two years old. This was intentional and would ensure that no victim of these leaks would be harmed by this article as they've had an opportunity to reset their passwords. Also, passwords used in 2016 would still provide excellent datasets for understanding how people create passwords today.

What Makes a Good Password List?

It's not realistically possible to brute-force an SSH service or web login with a list of 5,000,000 passwords. An attack like that would set off all kinds of alarms and take an incomprehensible amount of time to complete.

Some may believe that massive, comprehensive, 100 GB wordlists are common and often utilized by hackers. However, we'll learn that small, targeted, fine-tuned wordlists will usually get the job done while avoiding detection. The quality (or commonness) of the passwords take priority over the length of the wordlist.

What Is Pipal?

Pipal, created by Digininja, a well-known hacker in cybersecurity circles, is a password analyzer which curates password list statistics. Pipal is capable of identifying the most common digits appended to passwords, the most common length of passwords, the most common passwords found in the databases, and much more.

This data is valuable to hackers looking to improve the strength of their wordlists and increase the likeliness of success when performing brute-force attacks. Below is an example of Pipal's output after analyzing the µTorrent hack, which consisted of nearly 400,000 passwords.

Top 35 µTorrent.com Passwords

123456 = 386 (0.11%)
forum123 = 152 (0.04%)
password = 116 (0.03%)
utorrent = 94 (0.03%)
qwerty = 71 (0.02%)
12345678 = 57 (0.02%)
123456789 = 57 (0.02%)
111111 = 46 (0.01%)
123123 = 37 (0.01%)
Mykey2012 = 35 (0.01%)
abc123 = 30 (0.01%)
000000 = 27 (0.01%)
trustno1 = 26 (0.01%)
letmein = 26 (0.01%)
torrent = 24 (0.01%)
qazwsx = 24 (0.01%)
Mykey2011 = 23 (0.01%)
1234 = 21 (0.01%)
666666 = 20 (0.01%)
shadow = 19 (0.01%)
12345 = 19 (0.01%)
1234567 = 19 (0.01%)
1q2w3e4r = 19 (0.01%)
dragon = 18 (0.0%)
fuckyou = 18 (0.0%)
Paperindex1* = 18 (0.0%)
abcd1234 = 16 (0.0%)
matrix = 15 (0.0%)
123321 = 15 (0.0%)
1234567890 = 15 (0.0%)
master = 14 (0.0%)
monkey = 14 (0.0%)
123qwe = 14 (0.0%)
jackass = 13 (0.0%)
killer = 13 (0.0%)

Step 1: Install Pipal

Pipal can be found in Kali, but it's a slightly older version which doesn't support all of the available features and should be removed to avoid any confusion.

apt-get autoremove pipal

Ruby version 2.5 (or later) is required to use Pipal, and Git is required to clone the GitHub repository. The below command can be used to install both.

apt-get install ruby2.5 git

Next, clone the Pipal GitHub repository.

git clone https://github.com/digininja/pipal.git

Use the cd command to change into the newly created Pipal directory.

cd pipal/

When using Pipal, be sure to use ruby2.5. To view the available Pipal options, use the --help argument.

ruby2.5 pipla.rb --help

Step 2: Enable Pipal Checker Modules (Optional)

Checkers are the modules that perform the actual analysis. The available modules can be found in the /pipal/checkers_available/ directory.

Alternately, the modules can be viewed using the --list-checkers command.

By default, Pipal will analyze password lists and display tons of useful information using the basic.rb (Basic_Checker) module. However, to enhance Pipal's analyses capabilities, copy the desired modules from checkers_available/ directory to the checkers_enabled/ directory.

I recommend enabling the modules that start with "EN_," which will enumerate the most popular religious terms, explicit terms, colors, vehicles, and more. Keep in mind, using many modules will increase the duration of the analysis. In some cases, where large (1 GB) wordlists were analyzed, Pipal would crash and fail to complete the analysis.

To symbolically link a single module to the checkers_enabled/ directory, use the below command.

ln -s /path/to/pipal/checkers_available/ModuleNameHere.rb checkers_enabled/

To symlink all of the "EN_" modules, use the below command.

ln -s /path/to/pipal/checkers_available/EN_* checkers_enabled/

The wildcard (*) tells the ln command to symbolically link any file starting with "EN_" to the checkers_enabled/ directory.

Step 3: Analyze Password Lists

Looking back at the available options again, there are two primary arguments which are always used.

By default, Pipal will only display the top 10 most common statistics. This default value is a bit low, so the --top argument should be used to increase that value. In all my below password analyses, the top 500 passwords were displayed. The --output argument is used to specify the file path and directory where the analyzed data is saved.

Using Pipal is very simple. Type the below command into a terminal to start analyzing password lists.

ruby2.5 pipal.rb --top 500 --output /path/to/output_file.pipal /path/to/password.list

Wordlists which contain millions of passwords can take several minutes (up to an hour) for Pipal to completely analyze. Files created by Pipal can be viewed with any text editor found in Kali. I append ".pipal" to my output files, but that's just for clarity; the files are regular text files. Output files can also be saved as "output_file.pipal.txt" if needed.

A complete detail of all the Pipal analyses featured in this article can be found on my GitHub.

1. 000webhost.com Password Analysis

000webhost is a free web hosting service which caters to millions of users worldwide. The 000webhost.com hack occurred in 2015, making this database about 3 years old. However, it offered a large dataset of over 13,000,000 passwords so it seemed appropriate to include it in this article.

After analyzing the 000webhost password list, here's what I found:

Password Length

Most passwords were only eight-characters long accounting for 34% of all the unique passwords found in the password wordlist. Roughly 20% of passwords were only seven or six characters long — which is astonishingly low.

8 = 67313 (34.58%)
6 = 33392 (17.15%)
9 = 29588 (15.2%)
7 = 24916 (12.8%)
10 = 23994 (12.33%)

This information is extremely valuable to hackers as it indicates that most wordlists designed for remote brute-force attacks only need to be six to eight characters long to cover roughly 50% of all password lengths. A patient hacker would include nine and ten character passwords to get closer to 90% effectiveness but that may not be required in most cases.

Appended Digits

It's not uncommon for people to add a number or two to the end of their passwords, e.g., password123. Over 25% of all passwords were found to have one or two digits appended to the password. Two-digit numbers were the most common with 16 percent.

Single digit on the end = 22,230 (11.42%)
Two digits on the end = 31,214 (16.03%)
Three digits on the end = 18,447 (9.48%)

The most common single digit appended to a password was the number "1," being used 24,214 times and accounting for over 12% of all passwords. This was followed closely by the number "3," appended 16,362 times, nearly one out of every 11 passwords.

1 = 24,214 (12.44%)
3 = 16,362 (8.41%)
2 = 11,650 (5.98%)
0 = 9,687 (4.98%)
4 = 8,671 (4.45%)

The most common two digits appended to a password was the number "23;" appended 9,054 times, only four percent.

23 = 9,054 (4.65%)
12 = 3,822 (1.96%)
01 = 3,629 (1.86%)
11 = 3,089 (1.59%)
00 = 2,791 (1.43%)

The most common three digits appended to a password was the number "123;" again, just over four percent.

123 = 7,938 (4.08%)
456 = 2,143 (1.1%)
234 = 1,644 (0.84%)
000 = 935 (0.48%)
007 = 635 (0.33%)

The numbers 1, 3, 2, 23, 12, 123, and 456 were appended to over 33% (75,000+) of all passwords. It almost doesn't make sense to include other appendages in brute-force wordlists. Statistically speaking, other numbers appear too infrequently to warrant inclusion.

Special Characters

With the "@" special character only being included in .8% of all passwords, it's safe to omit passwords containing special characters (or "1337 Speak") from brute-force wordlists. A patient hacker who wishes to create a really comprehensive wordlist may consider including some of the top three special characters. Adversely, someone hoping to protect their account from brute-force attacks may want to include a special character in their (probably weak) password.

@ = 1,614 (0.83%)
. = 881 (0.45%)
# = 780 (0.4%)

Top 25 Passwords

Anyone familiar with password lists won't be surprised to see "123456" is the most common password having been used to secure 783 different accounts. "Abcdef123" and "a123456" follow closely behind with both used over 500 times each.

123456 = 783 (0.4%)
Abcdef123 = 608 (0.31%)
a123456 = 580 (0.3%)
little123 = 468 (0.24%)
nanda334 = 391 (0.2%)
N97nokia = 367 (0.19%)
password = 315 (0.16%)
Pawerjon123 = 275 (0.14%)
421uiopy258 = 230 (0.12%)
MYworklist123 = 182 (0.09%)
12345678 = 175 (0.09%)
qwerty = 169 (0.09%)
nks230kjs82 = 152 (0.08%)
trustno1 = 150 (0.08%)
zxcvbnm = 138 (0.07%)
N97nokiamini = 132 (0.07%)
letmein = 131 (0.07%)
123456789 = 131 (0.07%)
myplex = 110 (0.06%)
gm718422@ = 109 (0.06%)
churu123A = 107 (0.05%)
abc123 = 105 (0.05%)
plex123 = 95 (0.05%)
any123456 = 94 (0.05%)
Lwf1681688 = 92 (0.05%)

It's not unusual to see strange or bizarre passwords ranked highly in database lists. The password "nanda123" and "N97nokia," for example. These passwords were used over 350 times each. It's unclear how this happened, most likely a small group of individuals (probably hackers) created multiple accounts over a long period of time and reused the same password over and over. When generating wordlists, it's really up to the hacker to determine whether or not to include a particular password in the wordlist.

Top 25 Base Words

Here's where I think Pipal really shines. It's able to omit the numbers appended to the ends of passwords and analyze the words used at the beginning of the passwords. This data is especially useful to hackers because they're then able to use the base words in conjunction with the most commonly used digits to create comprehensive wordlists. For example, take note of "welcome" ranked 24th in the below list.

password = 735 (0.38%)
abcdef = 699 (0.36%)
plex = 546 (0.28%)
qwerty = 505 (0.26%)
little = 481 (0.25%)
nanda = 401 (0.21%)
n97nokia = 367 (0.19%)
pawerjon = 275 (0.14%)
letmein = 252 (0.13%)
uiopy = 230 (0.12%)
trustno = 200 (0.1%)
abcd = 189 (0.1%)
passw0rd = 186 (0.1%)
monkey = 184 (0.09%)
myworklist = 182 (0.09%)
master = 171 (0.09%)
pass = 166 (0.09%)
asdf = 164 (0.08%)
gondola = 164 (0.08%)
dragon = 156 (0.08%)
zxcvbnm = 154 (0.08%)
nks230kjs = 152 (0.08%)
hello = 148 (0.08%)
welcome = 141 (0.07%)
n97nokiamini = 133 (0.07%)

If the most common single or double digits ("1" and "23") are appended to "welcome," we find that this password was used several dozen times.

This is why appending common digits to base words is more beneficial to wordlists than simply compiling a list of the top passwords. How we, as individuals, choose base words and choose digits to append is the one notable inconsistency with how people create passwords. It's better to isolate the two variables then combine the results in a new wordlist.

2. VK.com Password Analysis

VK, a social network heavily inspired by Facebook, is the most popular website in Russia and ranked in the top 20 most popular websites in the world. The social network reportedly has over 300 million registered users.

The VK.com hack emerged in 2016, but occurred in 2012. While the passwords found in this data breach are nearly six years old, I couldn't miss the opportunity to analyze a massive dataset of over 92,470,000 passwords.

After analyzing the VK.com password list, here's what I found:

Password Length

More than 50% of all passwords are between six and eight characters long. This is consistent with data found in the 000webhost dataset and reaffirms most wordlists don't need to contain passwords over nine or ten characters long.

6 = 17,665,381 (19.1%)
8 = 17,370,491 (18.78%)
7 = 12,391,947 (13.4%)
9 = 9,815,371 (10.61%)
10 = 7,686,762 (8.31%)

Appended Digits

Fewer passwords appeared with appended digits compared to the 000webhost data, accounting for roughly 12 percent.

Single digit on end = 3,023,338 (3.27%)
Two digits on end = 5,326,255 (5.76%)
Three digits on end = 3,412,773 (3.69%)

The most common single and double digits are again "1," "3," and "23," accounting for about 9% (12,600,000) of all passwords.

1 = 5,919,242 (6.4%)
3 = 5,221,786 (5.65%)
0 = 5,079,464 (5.49%)
6 = 4,854,551 (5.25%)

23 = 1,474,608 (1.59%)
11 = 1,398,248 (1.51%)
89 = 1,337,274 (1.45%)
56 = 1,266,445 (1.37%)

123 = 1,135,684 (1.23%)
456 = 1,003,088 (1.08%)
789 = 638,695 (0.69%)
777 = 584,292 (0.63%)

Top 25 Passwords

We can clearly see fewer passwords appeared with appended digits compared to the 000webhost data, and more passwords containing only numbers are popular among users in this dataset.

123456 = 653,959
123456789 = 383,177
qwerty = 263,565
111111 = 176,226
1234567890 = 144,494
1234567 = 131,279
12345678 = 99,885
123321 = 87,148
000000 = 85,468
123123 = 84,036
7777777 = 81,544
zxcvbnm = 79,199
666666 = 72,052
qwertyuiop = 69,178
123qwe = 62,680
555555 = 61,762
1q2w3e = 57,425
gfhjkm = 51,310
qazwsx = 50,686
1q2w3e4r = 49,676
654321 = 48,435
987654321 = 46,461
121212 = 41,896
777777 = 39,966
zxcvbn = 39,527

Unfortunately, Pipal is restrained by our computer's memory (RAM) and struggled to analyze VK.com's 92,000,000 password dataset. Pipal wasn't able to determine the percentages of each password found or the top base words or passwords with special characters, but we were able to figure out the number of times each password appeared. Special thanks to @digininja for working with me to analyze this dataset.

3. ClixSense.com Password Analysis

ClixSense is a "paid-to-click" website that compensates people (microtransactions) for taking part in surveys and viewing advertisements. (It is also banned on Null Byte, in case you had any ideas in the forum.)

The ClixSense hacked data was posted online, in 2016, by the attackers who claimed it was a subset of a larger 6.6 million dataset. There are 2.2 million passwords in my ClixSense password list. While it's not the complete ClixSense list, it still provides an adequately large dataset belonging to a very recent hack.

After analyzing the ClixSense.com password list, here's what I found:

Password Length

Passwords consisting of nine or fewer characters are (again) the most common length of password found in large leaks. This further confirms the fact that short, eight- and six-character passwords, should be used when designing wordlists for brute-force attacks.

8 = 526,916 (23.72%)
6 = 407,346 (18.33%)
9 = 314,908 (14.17%)
10 = 286,220 (12.88%)
7 = 285,726 (12.86%)

Appended Digits

The one-digit, two-digit, and three-digit combinations most commonly appended to passwords are very consistent with the 000webhost.com dataset.

Single digit on the end = 121,811 (5.48%)
Two digits on the end = 239,247 (10.77%)
Three digits on the end = 151,586 (6.82%)

1 = 177,622 (7.99%)
3 = 159,989 (7.2%)
0 = 119,043 (5.36%)
2 = 118,338 (5.33%)

23 = 71,414 (3.21%)
56 = 31,159 (1.4%)
12 = 30,915 (1.39%)
11 = 30,248 (1.36%)

123 = 58,898 (2.65%)
456 = 25,874 (1.16%)
234 = 11,166 (0.5%)
007 = 10,573 (0.48%)

Special Characters

Once again, the "@" and "." special characters were found in over 1% of passwords. This is too small of a percent to warrant including special characters in wordlists, but again, using special characters in passwords will significantly hinder an attacker's ability to brute-force a service.

@ = 31,778 (1.43%)
. = 16,108 (0.73%)
_ = 14,711 (0.66%)
! = 11,248 (0.51%)

Top 25 Passwords

The password "123456" once again takes the lead being used 17,879 times. And again we see a unique password, "bismillah," used over 1,000 times (see the "base words" list after this list). It's not uncommon to see cultural or religious terms in datasets where the hacked website is popular in a particular country.

123456 = 17,879 (0.8%)
123456789 = 3,292 (0.15%)
12345678 = 2,093 (0.09%)
password = 1,970 (0.09%)
111111 = 1,892 (0.09%)
1234567 = 1,300 (0.06%)
iloveyou = 1,266 (0.06%)
qwerty = 1,187 (0.05%)
clixsense = 1,173 (0.05%)
000000 = 977 (0.04%)
abcdefg = 972 (0.04%)
123123 = 923 (0.04%)
pakistan = 803 (0.04%)
654321 = 745 (0.03%)
users = 736 (0.03%)
bismillah = 644 (0.03%)
abc123 = 615 (0.03%)
1234567890 = 537 (0.02%)
666666 = 525 (0.02%)
asdfgh = 524 (0.02%)
computer = 516 (0.02%)
aaaaaa = 502 (0.02%)
secret = 392 (0.02%)
iloveu = 391 (0.02%)
krishna = 391 (0.02%)

Top 25 Base Words

A closer look at the top 25 base words reveals some great results. With the exception of website-specific passwords ("clixsense" and "clix"), hackers would incorporate most of these words into their wordlists and experience some success with brute-force attacks.

password = 3,937 (0.18%)
clixsense = 2,989 (0.13%)
qwerty = 2,798 (0.13%)
iloveyou = 2,101 (0.09%)
pakistan = 1,965 (0.09%)
clix = 1,285 (0.06%)
money = 1,271 (0.06%)
love = 1,244 (0.06%)
june = 1,208 (0.05%)
abcdefg = 1,117 (0.05%)
bismillah = 1,026 (0.05%)
april = 1,006 (0.05%)
welcome = 990 (0.04%)
july = 984 (0.04%)
jesus = 950 (0.04%)
abcd = 936 (0.04%)
master = 916 (0.04%)
angel = 899 (0.04%)
nokia = 896 (0.04%)
computer = 882 (0.04%)
krishna = 822 (0.04%)
march = 810 (0.04%)
august = 803 (0.04%)
daniel = 777 (0.03%)
secret = 766 (0.03%)

4. µTorrent.com Password Analysis

µTorrent is a popular peer-to-peer file sharing client, managed by BitTorrent.com. The µTorrent forum hack occurred in 2016 and consisted of almost 400,000 leaked passwords. This dataset is the smallest featured in this article, but still provides insight into how passwords are created today.

Password Length

A jarring 88% of passwords are eight characters long. This is an increase from other datasets in this article. Still, only 3% of passwords are nine or ten characters long and well within brute-forcing range.

8 = 323,102 (88.66%)
6 = 11,314 (3.1%)
9 = 8,108 (2.22%)
7 = 7,962 (2.18%)
10 = 6,058 (1.66%)

Appended Digits

A single digit was most commonly appended to a password. We just learned most passwords are a total of eight characters long, so this indicates that most base words are only seven letters long and likely included a "1" or "3" at the end of the password.

Single digit on the end = 45,852 (12.58%)
Two digits on the end = 13,569 (3.72%)
Three digits on the end = 4,923 (1.35%)

1 = 10,475 (2.87%)
3 = 8,904 (2.44%)
2 = 8,123 (2.23%)
0 = 7,925 (2.17%)

23 = 1,737 (0.48%)
11 = 978 (0.27%)
12 = 894 (0.25%)
00 = 837 (0.23%)

123 = 1,365 (0.37%)
456 = 521 (0.14%)
234 = 307 (0.08%)
000 = 252 (0.07%)

Special Characters

The top three most popular special characters were used in just over 0.5% of all passwords. Again, too small of a percent to warrant including special characters in wordlists, but really good to use when you want to hinder an attacker's ability to brute-force a service.

/ = 833 (0.23%)
+ = 805 (0.22%)
@ = 627 (0.17%)

Top 25 Passwords

Once again, we see "utorrent," a website-specific password appearing in the top four passwords.

123456 = 386 (0.11%)
forum123 = 152 (0.04%)
password = 116 (0.03%)
utorrent = 94 (0.03%)
qwerty = 71 (0.02%)
12345678 = 57 (0.02%)
123456789 = 57 (0.02%)
111111 = 46 (0.01%)
123123 = 37 (0.01%)
Mykey2012 = 35 (0.01%)
abc123 = 30 (0.01%)
000000 = 27 (0.01%)
trustno1 = 26 (0.01%)
letmein = 26 (0.01%)
torrent = 24 (0.01%)
qazwsx = 24 (0.01%)
Mykey2011 = 23 (0.01%)
1234 = 21 (0.01%)
666666 = 20 (0.01%)
shadow = 19 (0.01%)
12345 = 19 (0.01%)
1234567 = 19 (0.01%)
1q2w3e4r = 19 (0.01%)
dragon = 18 (0.0%)
fuckyou = 18 (0.0%)

Top 25 Base Words

The website-specific "utorrent" passwords is at the top of the base words list. It's not uncommon to see people include the website name into their password. There are probably thousands of Null Byte readers using the password "nullbyte," "wonderhowto," or some variation.

utorrent = 205 (0.06%)
password = 166 (0.05%)
forum = 164 (0.05%)
qwerty = 128 (0.04%)
mykey = 62 (0.02%)
dragon = 49 (0.01%)
torrent = 42 (0.01%)
letmein = 36 (0.01%)
melto = 36 (0.01%)
shadow = 35 (0.01%)
abcd = 35 (0.01%)
qazwsx = 32 (0.01%)
monkey = 31 (0.01%)
trustno = 31 (0.01%)
fuckyou = 29 (0.01%)
superman = 28 (0.01%)
pass = 28 (0.01%)
alex = 28 (0.01%)
love = 28 (0.01%)
matrix = 27 (0.01%)
killer = 27 (0.01%)
master = 25 (0.01%)
passw0rd = 25 (0.01%)
nokia = 24 (0.01%)
welcome = 24 (0.01%)

Let Me Guess ... Your Password Is '123456'

The most common passwords between 2012 and 2016 are "123456," "password," and "123456789" — with a few variations in how they rank on a site-to-site basis. With that data spanning a six-year period, it's reasonable to believe these are still the most common password used today.

The more interesting data was discovered within the base words and digits appended to each password. These two datasets can be combined to create a much more (statistically) effective password-guessing wordlist.

Cover image and screenshots by tokyoneon/Null Byte

4 Comments

How do I obtain breached passwords? I would like to test methods like this on my own accounts. But I don't know where to find them.

Which laptop do you use ??

I know I'm a bit late to the party since I just joined... but, nice post. Kudos to you. lol.

Share Your Thoughts

  • Hot
  • Latest