How To: Use Leaked Password Databases to Create Brute-Force Wordlists

Use Leaked Password Databases to Create Brute-Force Wordlists

To name just a few companies, VK, µTorrent, and ClixSense all suffered significant data breaches at some point in the past. The leaked password databases from those and other online sites can be used to understand better how human-passwords are created and increase a hacker's success when performing brute-force attacks.

In other articles, we'll cover generating wordlists for use in password-cracking. But here, we'll learn how to create wordlists of statistical complexity and length based on actual passwords found in database leaks that occurred in recent years. Understanding how average, every-day people think about passwords will aid hackers during password-guessing attacks and greatly increase the statical probability of the success of the brute-force attacks.

Disclaimer

The leaked databases featured in this article were obtained using public and darknet resources. The databases are all at least four years old. This was intentional and would ensure that this article would harm no victim of these leaks as they've had an opportunity to reset their passwords. Also, passwords used in 2016 would still provide excellent datasets for understanding how people create passwords today.

What Makes a Good Password List?

Realistically, it's not possible to brute-force an SSH service or web login with a list of five million passwords. An attack like that would set off all kinds of alarms and take an incomprehensible amount of time to complete.

Some may believe that massive, comprehensive, 100 GB wordlists are common and often utilized by hackers. However, we'll learn that small targeted and fine-tuned wordlists will usually get the job done while avoiding detection. The quality (or commonness) of the passwords takes priority over the length of the wordlist.

What Is Pipal?

Pipal, created by Digininja, a well-known hacker in cybersecurity circles, is a password analyzer which curates password list statistics. Pipal is capable of identifying the most common digits appended to passwords, the most common length of passwords, the most common passwords found in the databases, and much more.

This data is valuable to hackers looking to improve the strength of their wordlists and increase the likeliness of success when performing brute-force attacks. Below is an example of Pipal's output (the top 35 passwords) after analyzing the µTorrent hack, which consisted of nearly 400,000 passwords.

123456       = 386 (0.11%)
forum123     = 152 (0.04%)
password     = 116 (0.03%)
utorrent     = 94  (0.03%)
qwerty       = 71  (0.02%)
12345678     = 57  (0.02%)
123456789    = 57  (0.02%)
111111       = 46  (0.01%)
123123       = 37  (0.01%)
Mykey2012    = 35  (0.01%)
abc123       = 30  (0.01%)
000000       = 27  (0.01%)
trustno1     = 26  (0.01%)
letmein      = 26  (0.01%)
torrent      = 24  (0.01%)
qazwsx       = 24  (0.01%)
Mykey2011    = 23  (0.01%)
1234         = 21  (0.01%)
666666       = 20  (0.01%)
shadow       = 19  (0.01%)
12345        = 19  (0.01%)
1234567      = 19  (0.01%)
1q2w3e4r     = 19  (0.01%)
dragon       = 18  (0.0%)
fuckyou      = 18  (0.0%)
Paperindex1* = 18  (0.0%)
abcd1234     = 16  (0.0%)
matrix       = 15  (0.0%)
123321       = 15  (0.0%)
1234567890   = 15  (0.0%)
master       = 14  (0.0%)
monkey       = 14  (0.0%)
123qwe       = 14  (0.0%)
jackass      = 13  (0.0%)
killer       = 13  (0.0%)

Continue below the Cyber Weapons Lab video to see how to install and use Pipal, enable modules, and analyze password lists. You could also watch the Null Byte video and follow along with my guide simultaneously, if that helps.

Step 1: Install or Update Ruby & Git

Ruby version 2.5 or later is required to use Pipal. If you think you have Ruby already, you can see which version you have with ruby -v.

~$ ruby -v

ruby 2.4.8 (2019-10-01 revision 026ee6f091) [x86_64-linux-gnu]

If you don't have it or it's outdated, you'll need to install or update it. You'll also need Git, which is required to clone the GitHub repository. The below command can be used to install or update both.

~$ sudo apt install ruby2.7 git

[sudo] password for null byte:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  git-man
Suggested packages:
  git-daemon-run | git-daemon-sysvinit git-doc git-el git-email git-gui gitk
  gitweb git-cvs git-mediawiki git-svn
The following packages will be upgraded:
  git git-man ruby2.7
3 upgraded, 0 newly installed, 0 to remove and 853 not upgraded.
Need to get 9,151 kB of archives.
After this operation, 2,443 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://kali.download/kali kali-rolling/main amd64 git amd64 1:2.27.0-1 [6,707 kB]
Get:2 http://kali.download/kali kali-rolling/main amd64 git-man all 1:2.27.0-1 [1,774 kB]
Get:3 http://kali.download/kali kali-rolling/main amd64 ruby2.7 amd64 2.7.1-3 [670 kB]
Fetched 9,151 kB in 2s (5,301 kB/s)
Reading changelogs... Done
(Reading database ... 377124 files and directories currently installed.)
Preparing to unpack .../git_1%3a2.27.0-1_amd64.deb ...
Unpacking git (1:2.27.0-1) over (1:2.26.2-1) ...
Preparing to unpack .../git-man_1%3a2.27.0-1_all.deb ...
Unpacking git-man (1:2.27.0-1) over (1:2.26.2-1) ...
Preparing to unpack .../ruby2.7_2.7.1-3_amd64.deb ...
Unpacking ruby2.7 (2.7.1-3) over (2.7.0-4) ...
Setting up ruby2.7 (2.7.1-3) ...
Setting up git-man (1:2.27.0-1) ...
Setting up git (1:2.27.0-1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for kali-menu (2020.2.2) ...

Step 2: Install Pipal

Pipal can be found in Kali, but it's a slightly older version which doesn't support all of the available features and should be removed to avoid any confusion.

~$ sudo apt autoremove pipal

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  kali-linux-default pipal
0 upgraded, 0 newly installed, 2 to remove and 856 not upgraded.
After this operation, 198 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 377177 files and directories currently installed.)
Removing kali-linux-default (2020.1.13) ...
Removing pipal (3.1-0kali0) ...
Processing triggers for kali-menu (2020.2.2) ...

Next, clone the Pipal GitHub repository.

~$ git clone https://github.com/digininja/pipal.git

Cloning into 'pipal'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 582 (delta 0), reused 1 (delta 0), pack-reused 578
Receiving objects: 100% (582/582), 158.46 KiB | 1.82 MiB/s, done.
Resolving deltas: 100% (349/349), done.

Use the cd command to change into the newly created Pipal directory.

~$ cd pipal

When using Pipal, be sure to use Ruby 2.5 or later, which you should have installed or updated in Step 1. To view the available Pipal options, use the --help argument.

~/pipal$ ruby2.7 pipal.rb --help

pipal 3.1 Robin Wood (robin@digi.ninja) (http://digi.ninja)

Usage: pipal [OPTION] ... FILENAME
        --help, -h, -?: show help
        --top, -t X: show the top X results (default 10)
        --output, -o <filename>: output to file
        --gkey <Google Maps API key>: to allow zip code lookups (optional)
        --list-checkers: Show the available checkers and which are enabled
        --verbose, -v: Verbose

        FILENAME: The file to count

Step 3: Enable Pipal Checker Modules (Optional)

Checkers are the modules that perform the actual analysis. The available modules can be found in the pipal/checkers_available directory.

~/pipal$ ls checkers_available

AU_place_checker.rb         FR_colour_checker.rb
basic.rb                    FR_date_checker.rb
BR_area_codes_checker.rb    FR_emotion_checker.rb
BR_soccer_teams_checker.rb  frequency.rb
date_checker.rb             FR_family_checker.rb
DE_colour_checker.rb        FR_hashcat_mask_generator.rb
DE_emotion_checker.rb       FR_season_checker.rb
DE_family_checker.rb        FR_windows_complexity_checker.rb
DE_religion_checker.rb      hashcat_mask_generator.rb
DE_road_checker.rb          NL_colour_checker.rb
DE_season_checker.rb        NL_date_checker.rb
DE_sport_checker.rb         NL_season_checker.rb
DE_vehicle_checker.rb       PTBR_colour_checker.rb
email_names.rb              PTBR_date_checker.rb
EN_colour_checker.rb        PTBR_emotion_checker.rb
EN_emotion_checker.rb       PTBR_explicit_checker.rb
EN_explicit_checker.rb      PTBR_family_checker.rb
EN_family_checker.rb        PTBR_religion_checker.rb
EN_military_checker.rb      PTBR_season_checker.rb
EN_religion_checker.rb      RU_russia_cities_checker.rb
EN_road_checker.rb          special_checker.rb
EN_season_checker.rb        US_area_codes_checker.rb
EN_sport_checker.rb         usernames.rb
EN_vehicle_checker.rb       US_state_checker.rb
EN_violence_checker.rb      US_zip_codes_checker.rb
external_list_checker.rb    windows_complexity_checker.rb
FR_area_codes.rb

Alternatively, the modules can be viewed using the --list-checkers command. If you get any errors after running this command, it's likely that something is missing or not fully updated in your Ruby version, but it should show you the gem command to install it.

~/pipal$ ruby2.7 pipal.rb --list-checkers

Error: levenshtein gem not installed
         use: "gem install levenshtein-ffi" to install the required gem

~/pipal$ sudo gem install levenshtein-ffi

[sudo] password for nullbyte:
Fetching levenshtein-ffi-1.1.0.gem
Building native extensions. This could take a while...
Successfully installed levenshtein-ffi-1.1.0
Parsing documentation for levenshtein-ffi-1.1.0
Installing ri documentation for levenshtein-ffi-1.1.0
Done installing documentation for levenshtein-ffi after 0 seconds
1 gem installed

~/pipal$ ruby2.7 pipal.rb --list-checkers

/home/nullbyte/pipal/checkers_available/FR_colour_checker.rb:11: warning: key "ocre" is duplicated and overwritten on line 11
pipal 3.1 Robin Wood (robin@digi.ninja) (http://digi.ninja)

You have the following Checkers on your system
==============================================
Australia_Checker - List of Australian places
BR_Area_Code_Checker - List of Brazil area codes
BR_Soccer_Teams_Checker - List of Brazilian Soccer Teams
Basic_Checker - Basic Checks - Enabled
Colour_Checker - List of common English colours
DE_Colour_Checker - List of common German colours
DE_Emotion_Checker - List of German emotional terms
DE_Family_Checker - List of German family terms
DE_Religion_Checker - List of German religious terms
DE_Road_Checker - List of German road terms
DE_Season_Checker - List of common German seasons
DE_Sport_Checker - List of German sport terms
DE_Vehicle_Checker - List of common vehicle manufacturers and models
Date_Checker - Days, months and years
Email_Checker - Compare email addresses to passwords. Checks both name and full address.
Emotion_Checker - List of English emotional terms
Explicit_Checker - List of English explicit terms
External_List_Checker - Check an external file for matches
FR_Colour_Checker - List of common French colours
FR_Date_Checker - French day, month and year checker
FR_Emotion_Checker - List of French emotional terms
FR_Family_Checker - List of French family terms
FR_Hashcat_Mask_Generator - Hashcat mask generator (French)
FR_Season_Checker - List of common French seasons
FR_Windows_Complexity_Checker - Check for default Windows complexity (French)
FR_area_Code_Checker - List of French area codes
Family_Checker - List of English family terms
Frequency_Checker - Count the frequency of characters in each position, output as either CSV or text
Hashcat_Mask_Generator - Hashcat mask generator
Military_Checker - List of English military terms
NL_Colour_Checker - List of common dutch colours
NL_Date_Checker - Dutch day, month and year checker
NL_Season_Checker - List of common Dutch seasons
PTBR_Date_Checker - Brazilian Portuguese day, month and year checker
PTBR_Emotion_Checker - List of Brazilian Portuguese emotional terms
PTBR_Explicit_Checker - List of Brazilian Portuguese explicit terms
PTBR_Family_Checker - List of Brazilian Portuguese family terms
PTBR_Religion_Checker - List of Brazilian Portuguese religious terms
PTBR_Season_Checker - List of common Brazilian Portuguese seasons
Religion_Checker - List of religious terms
Road_Checker - List of English road terms
Russian_Cities_Checker - List of common Russian cities
Season_Checker - List of common English seasons
Special_Checker - No description given
Sport_Checker - List of English sport terms
US_Area_Code_Checker - List of US area codes
US_State_Checker - List of United States states
US_Zip_Code_Checker - List of US zip codes
Username_Checker - Compare usernames to passwords.
Vehicle_Checker - List of common vehicle manufacturers and models
Violence_Checker - List of English violent terms
Windows_Complexity_Checker - Check for default Windows complexity

By default, Pipal will analyze password lists and display tons of useful information using the basic.rb (Basic_Checker) module. However, to enhance Pipal's analysis capabilities, copy the desired modules from pipal/checkers_available directory to the pipal/checkers_enabled directory.

I recommend enabling the modules that start with "EN_," which will enumerate the most popular religious terms, explicit terms, colors, vehicles, and more. Keep in mind, using many modules will increase the duration of the analysis. In some cases, where large (1 GB) wordlists were analyzed, Pipal would crash and fail to complete the analysis.

To symbolically link a single module to the pipal/checkers_enabled directory, use the below command, replacing the .rb file with the one you want to use.

/pipal$ ln -s checkers_available/EN_emotion_checker.rb checkers_enabled

To symlink all of the "EN_" modules, use the below command. The wildcard (*) tells the ln command to symbolically link any file starting with "EN_" to the pipal/checkers_enabled directory.

/pipal$ ln -s checkers_available/EN_* checkers_enabled

Step 4: Analyze Password Lists

Looking back at the available options again, there are two primary arguments which are always used.

~/pipal$ ruby2.7 pipal.rb --help

pipal 3.1 Robin Wood (robin@digi.ninja) (http://digi.ninja)

Usage: pipal [OPTION] ... FILENAME
        --help, -h, -?: show help
        --top, -t X: show the top X results (default 10)
        --output, -o <filename>: output to file
        --gkey <Google Maps API key>: to allow zip code lookups (optional)
        --list-checkers: Show the available checkers and which are enabled
        --verbose, -v: Verbose

        FILENAME: The file to count

By default, Pipal will only display the top 10 most common statistics. This default value is a bit low, so the --top argument should be used to increase that value. In all my below password analyses, the top 500 passwords were displayed. The --output argument is used to specify the file path and directory where the analyzed data is saved.

Using Pipal is very simple. Type the below command into a terminal to start analyzing password lists. Both files are text files, but you could use "txt" if you wanted, and both of their names can be customized too. I'm using "results.pipal" and "password.list" variations in my "dumps" folder in my home directory.

~/pipal$ ruby2.7 pipal.rb --top 500 --output ../dumps/results.pipal ../dumps/password.list

Generating stats, hit CTRL-C to finish early and dump stats on words already processed.
Please wait...
Processing:    100$ |oooooooooooooooooooooooooooo | ETA 00:00:00

Wordlists that contain millions of passwords can take several minutes (up to an hour) for Pipal to analyze thoroughly. Complete details of all the Pipal analyses featured in this article can be found on my GitHub.

Example 1: 000webhost.com Password Analysis

000webhost is a free web hosting service that caters to millions of users worldwide. The 000webhost.com hack occurred in 2015, making this database about five years old. However, it offered a large dataset of over 13 million passwords, so it seemed appropriate to include it in this article.

~/pipal$ ruby2.7 pipal.rb --top 500 --output ../dumps/analysis/000webhost.com.pipal ../dumps/000webhost.com_2015_password.list

Generating stats, hit CTRL-C to finish early and dump stats on words already processed.
Please wait...
Processing:    100% |oooooooooooooooooooooooooooo | ETA 00:00:00

After analyzing the 000webhost password list, here's what I found:

Password Length

Most passwords were only eight-characters long, accounting for 34% of all the unique passwords found in the password wordlist. Roughly 20% of passwords were only seven or six characters long — which is astonishingly low.

8  = 67313 (34.58%)
6  = 33392 (17.15%)
9  = 29588 (15.2%)
7  = 24916 (12.8%)
10 = 23994 (12.33%)

This information is valuable to hackers as it indicates that most wordlists designed for remote brute-force attacks only need to be six to eight characters long to cover roughly 50% of all password lengths. A patient hacker would include nine- and ten-character passwords to get closer to 90% effectiveness, but that may not be required in most cases.

Appended Digits

It's not uncommon for people to add a number or two to the end of their passwords, e.g., password123. Over 25% of all passwords were found to have one or two digits appended to the password. Two-digit numbers were the most common with 16 percent.

Single digit on the end = 22,230 (11.42%)
Two digits on the end   = 31,214 (16.03%)
Three digits on the end = 18,447 (9.48%)

The most common single digit appended to a password was the number "1," being used 24,214 times and accounting for over 12% of all passwords. It was followed closely by the number "3," appended 16,362 times or nearly one out of every 11 passwords.

1 = 24,214 (12.44%)
3 = 16,362 (8.41%)
2 = 11,650 (5.98%)
0 = 9,687  (4.98%)
4 = 8,671  (4.45%)

The most common two digits appended to a password was the number "23;" appended 9,054 times, only four percent.

23 = 9,054 (4.65%)
12 = 3,822 (1.96%)
01 = 3,629 (1.86%)
11 = 3,089 (1.59%)
00 = 2,791 (1.43%)

The most common three digits appended to a password was the number "123;" again, just over four percent.

123 = 7,938 (4.08%)
456 = 2,143 (1.1%)
234 = 1,644 (0.84%)
000 = 935   (0.48%)
007 = 635   (0.33%)

The numbers 1, 3, 2, 23, 12, 123, and 456 were appended to over 33% (75,000+) of all passwords. It almost doesn't make sense to include other appendages in brute-force wordlists. Statistically speaking, other numbers appear too infrequently to warrant inclusion.

Special Characters

With the "@" special character only being included in 0.8% of all passwords, it's safe to omit passwords containing special characters (or "1337 Speak") from brute-force wordlists. A patient hacker who wishes to create a comprehensive wordlist may consider including some of the top three special characters. Adversely, someone hoping to protect their account from brute-force attacks may want to include a special character in their (probably weak) password.

@ = 1,614 (0.83%)
. = 881   (0.45%)
# = 780   (0.4%)

Top 25 Passwords

Anyone familiar with password lists won't be surprised to see "123456" is the most common password having been used to secure 783 different accounts. "Abcdef123" and "a123456" follow closely behind with both used over 500 times each.

123456        = 783 (0.4%)
Abcdef123     = 608 (0.31%)
a123456       = 580 (0.3%)
little123     = 468 (0.24%)
nanda334      = 391 (0.2%)
N97nokia      = 367 (0.19%)
password      = 315 (0.16%)
Pawerjon123   = 275 (0.14%)
421uiopy258   = 230 (0.12%)
MYworklist123 = 182 (0.09%)
12345678      = 175 (0.09%)
qwerty        = 169 (0.09%)
nks230kjs82   = 152 (0.08%)
trustno1      = 150 (0.08%)
zxcvbnm       = 138 (0.07%)
N97nokiamini  = 132 (0.07%)
letmein       = 131 (0.07%)
123456789     = 131 (0.07%)
myplex        = 110 (0.06%)
gm718422@     = 109 (0.06%)
churu123A     = 107 (0.05%)
abc123        = 105 (0.05%)
plex123       = 95  (0.05%)
any123456     = 94  (0.05%)
Lwf1681688    = 92  (0.05%)

It's not unusual to see strange or bizarre passwords ranked highly in database lists. The password "nanda123" and "N97nokia," for example. These passwords were used over 350 times each. It's unclear how this happened, most likely a small group of individuals (probably hackers) created multiple accounts over a long period of time and reused the same password over and over. When generating wordlists, it's really up to the hacker to determine whether or not to include a particular password in the wordlist.

Top 25 Base Words

Here's where I think Pipal really shines. It's able to omit the numbers appended to the ends of passwords and analyze the words used at the beginning of the passwords. This data is especially useful to hackers because they're then able to use the base words in conjunction with the most commonly used digits to create comprehensive wordlists. For example, take note of "welcome" ranked 24th in the below list.

password     = 735 (0.38%)
abcdef       = 699 (0.36%)
plex         = 546 (0.28%)
qwerty       = 505 (0.26%)
little       = 481 (0.25%)
nanda        = 401 (0.21%)
n97nokia     = 367 (0.19%)
pawerjon     = 275 (0.14%)
letmein      = 252 (0.13%)
uiopy        = 230 (0.12%)
trustno      = 200 (0.1%)
abcd         = 189 (0.1%)
passw0rd     = 186 (0.1%)
monkey       = 184 (0.09%)
myworklist   = 182 (0.09%)
master       = 171 (0.09%)
pass         = 166 (0.09%)
asdf         = 164 (0.08%)
gondola      = 164 (0.08%)
dragon       = 156 (0.08%)
zxcvbnm      = 154 (0.08%)
nks230kjs    = 152 (0.08%)
hello        = 148 (0.08%)
welcome      = 141 (0.07%)
n97nokiamini = 133 (0.07%)

If the most common single or double digits ("1" and "23") are appended to "welcome," we find that this password was used several dozen times.

~/pipal$ grep -i 'welcome1' ../dumps/000webhost.com_2015_password.list

welcome1
welcome1
welcome11
welcome123
welcome1
welcome123
welcome123
welcome1
welcome1
welcome123
welcome1
welcome1234
welcome123
welcome1
welcome1
welcome1

This is why appending common digits to base words is more beneficial to wordlists than simply compiling a list of the top passwords. How we, as individuals, choose base words and choose digits to append is the one notable inconsistency with how people create passwords. It's better to isolate the two variables then combine the results in a new wordlist.

Example 2: VK.com Password Analysis

VK, a social network heavily inspired by Facebook, is the most popular website in Russia and ranked in the top 20 most popular websites in the world. The social network reportedly has over 300 million registered users.

The VK.com hack emerged in 2016, but occurred in 2012. While the passwords found in this data breach are nearly six years old, I couldn't miss the opportunity to analyze a massive dataset of over 92,470,000 passwords.

~/pipal$ ruby2.7 pipal.rb --top 500 --output ../dumps/analysis/vk.com.pipal ../dumps/vk.com_2012_password.list

Generating stats, hit CTRL-C to finish early and dump stats on words already processed.
Please wait...
Processing:    100% |oooooooooooooooooooooooooooo | ETA 00:00:00

After analyzing the VK.com password list, here's what I found:

Password Length

More than 50% of all passwords are between six and eight characters long. This is consistent with data found in the 000webhost dataset and reaffirms most wordlists don't need to contain passwords over nine or ten characters long.

6  = 17,665,381 (19.1%)
8  = 17,370,491 (18.78%)
7  = 12,391,947 (13.4%)
9  = 9,815,371  (10.61%)
10 = 7,686,762  (8.31%)

Appended Digits

Fewer passwords appeared with appended digits compared to the 000webhost data, accounting for roughly 12 percent.

Single digit on end = 3,023,338 (3.27%)
Two digits on end   = 5,326,255 (5.76%)
Three digits on end = 3,412,773 (3.69%)

The most common single and double digits are again "1," "3," and "23," accounting for about 9% (12,600,000) of all passwords.

1 = 5,919,242 (6.4%)
3 = 5,221,786 (5.65%)
0 = 5,079,464 (5.49%)
6 = 4,854,551 (5.25%)
23 = 1,474,608 (1.59%)
11 = 1,398,248 (1.51%)
89 = 1,337,274 (1.45%)
56 = 1,266,445 (1.37%)
123 = 1,135,684 (1.23%)
456 = 1,003,088 (1.08%)
789 = 638,695   (0.69%)
777 = 584,292   (0.63%)

Top 25 Passwords

We can clearly see fewer passwords appeared with appended digits compared to the 000webhost data, and more passwords containing only numbers are popular among users in this dataset.

123456     = 653,959
123456789  = 383,177
qwerty     = 263,565
111111     = 176,226
1234567890 = 144,494
1234567    = 131,279
12345678   = 99,885
123321     = 87,148
000000     = 85,468
123123     = 84,036
7777777    = 81,544
zxcvbnm    = 79,199
666666     = 72,052
qwertyuiop = 69,178
123qwe     = 62,680
555555     = 61,762
1q2w3e     = 57,425
gfhjkm     = 51,310
qazwsx     = 50,686
1q2w3e4r   = 49,676
654321     = 48,435
987654321  = 46,461
121212     = 41,896
777777     = 39,966
zxcvbn     = 39,527

Unfortunately, Pipal is restrained by our computer's memory (RAM) and struggled to analyze VK.com's 92,000,000 password dataset. Pipal wasn't able to determine the percentages of each password found or the top base words or passwords with special characters, but we were able to figure out the number of times each password appeared. Special thanks to @digininja for working with me to analyze this dataset.

Example 3: ClixSense.com Password Analysis

ClixSense is a "paid-to-click" website that compensates people (microtransactions) for taking part in surveys and viewing advertisements. (It is also banned on Null Byte, in case you had any ideas in the forum.)

The ClixSense hacked data was posted online, in 2016, by the attackers who claimed it was a subset of a larger 6.6 million dataset. There are 2.2 million passwords in my ClixSense password list. While it's not the complete ClixSense list, it still provides an adequately large dataset belonging to a very recent hack.

~/pipal$ ruby2.7 pipal.rb --top 500 --output ../dumps/analysis/clixsense.com.pipal ../dumps/clixsense.com_2016_password.list

Generating stats, hit CTRL-C to finish early and dump stats on words already processed.
Please wait...
Processing:    100% |oooooooooooooooooooooooooooo | ETA 00:00:00

After analyzing the ClixSense.com password list, here's what I found:

Password Length

Passwords consisting of nine or fewer characters are (again) the most common length of password found in large leaks. This further confirms the fact that short, eight- and six-character passwords, should be used when designing wordlists for brute-force attacks.

8  = 526,916 (23.72%)
6  = 407,346 (18.33%)
9  = 314,908 (14.17%)
10 = 286,220 (12.88%)
7  = 285,726 (12.86%)

Appended Digits

The one-digit, two-digit, and three-digit combinations most commonly appended to passwords are very consistent with the 000webhost.com dataset.

Single digit on the end = 121,811 (5.48%)
Two digits on the end   = 239,247 (10.77%)
Three digits on the end = 151,586 (6.82%)
1 = 177,622 (7.99%)
3 = 159,989 (7.2%)
0 = 119,043 (5.36%)
2 = 118,338 (5.33%)
23 = 71,414 (3.21%)
56 = 31,159 (1.4%)
12 = 30,915 (1.39%)
11 = 30,248 (1.36%)
123 = 58,898 (2.65%)
456 = 25,874 (1.16%)
234 = 11,166 (0.5%)
007 = 10,573 (0.48%)

Special Characters

Once again, the "@" and "." special characters were found in over 1% of passwords. This is too small of a percent to warrant including special characters in wordlists, but again, using special characters in passwords will significantly hinder an attacker's ability to brute-force a service.

@ = 31,778 (1.43%)
. = 16,108 (0.73%)
_ = 14,711 (0.66%)
! = 11,248 (0.51%)

Top 25 Passwords

The password "123456" once again takes the lead being used 17,879 times. And again we see a unique password, "bismillah," used over 1,000 times (see the "base words" list after this list). It's not uncommon to see cultural or religious terms in datasets where the hacked website is popular in a particular country.

123456     = 17,879 (0.8%)
123456789  = 3,292  (0.15%)
12345678   = 2,093  (0.09%)
password   = 1,970  (0.09%)
111111     = 1,892  (0.09%)
1234567    = 1,300  (0.06%)
iloveyou   = 1,266  (0.06%)
qwerty     = 1,187  (0.05%)
clixsense  = 1,173  (0.05%)
000000     = 977    (0.04%)
abcdefg    = 972    (0.04%)
123123     = 923    (0.04%)
pakistan   = 803    (0.04%)
654321     = 745    (0.03%)
users      = 736    (0.03%)
bismillah  = 644    (0.03%)
abc123     = 615    (0.03%)
1234567890 = 537    (0.02%)
666666     = 525    (0.02%)
asdfgh     = 524    (0.02%)
computer   = 516    (0.02%)
aaaaaa     = 502    (0.02%)
secret     = 392    (0.02%)
iloveu     = 391    (0.02%)
krishna    = 391    (0.02%)

Top 25 Base Words

A closer look at the top 25 base words reveals some great results. With the exception of website-specific passwords ("clixsense" and "clix"), hackers would incorporate most of these words into their wordlists and experience some success with brute-force attacks.

password  = 3,937 (0.18%)
clixsense = 2,989 (0.13%)
qwerty    = 2,798 (0.13%)
iloveyou  = 2,101 (0.09%)
pakistan  = 1,965 (0.09%)
clix      = 1,285 (0.06%)
money     = 1,271 (0.06%)
love      = 1,244 (0.06%)
june      = 1,208 (0.05%)
abcdefg   = 1,117 (0.05%)
bismillah = 1,026 (0.05%)
april     = 1,006 (0.05%)
welcome   = 990   (0.04%)
july      = 984   (0.04%)
jesus     = 950   (0.04%)
abcd      = 936   (0.04%)
master    = 916   (0.04%)
angel     = 899   (0.04%)
nokia     = 896   (0.04%)
computer  = 882   (0.04%)
krishna   = 822   (0.04%)
march     = 810   (0.04%)
august    = 803   (0.04%)
daniel    = 777   (0.03%)
secret    = 766   (0.03%)

Example 4: µTorrent.com Password Analysis

µTorrent is a popular peer-to-peer file sharing client, managed by BitTorrent.com. The µTorrent forum hack occurred in 2016 and consisted of almost 400,000 leaked passwords. This dataset is the smallest featured in this article, but still provides insight into how passwords are created today.

~/pipal$ ruby2.7 pipal.rb --top 500 --output ../dumps/analysis/utorrent.com.pipal ../dumps/utorrent.com_2016_password.list

Generating stats, hit CTRL-C to finish early and dump stats on words already processed.
Please wait...
Processing:    100% |oooooooooooooooooooooooooooo | ETA 00:00:00

Password Length

A jarring 88% of passwords are eight characters long. This is an increase from other datasets in this article. Still, only 3% of passwords are nine or ten characters long and well within brute-forcing range.

8  = 323,102 (88.66%)
6  = 11,314  (3.1%)
9  = 8,108   (2.22%)
7  = 7,962   (2.18%)
10 = 6,058   (1.66%)

Appended Digits

A single digit was most commonly appended to a password. We just learned most passwords are a total of eight characters long, so this indicates that most base words are only seven letters long and likely included a "1" or "3" at the end of the password.

Single digit on the end = 45,852 (12.58%)
Two digits on the end   = 13,569 (3.72%)
Three digits on the end = 4,923  (1.35%)
1 = 10,475 (2.87%)
3 = 8,904  (2.44%)
2 = 8,123  (2.23%)
0 = 7,925  (2.17%)
23 = 1,737 (0.48%)
11 = 978   (0.27%)
12 = 894   (0.25%)
00 = 837   (0.23%)
123 = 1,365 (0.37%)
456 = 521   (0.14%)
234 = 307   (0.08%)
000 = 252   (0.07%)

Special Characters

The top three most popular special characters were used in just over 0.5% of all passwords. Again, too small of a percent to warrant including special characters in wordlists, but really good to use when you want to hinder an attacker's ability to brute-force a service.

/ = 833 (0.23%)
+ = 805 (0.22%)
@ = 627 (0.17%)

Top 25 Passwords

Once again, we see "utorrent," a website-specific password appearing in the top four passwords.

123456    = 386 (0.11%)
forum123  = 152 (0.04%)
password  = 116 (0.03%)
utorrent  = 94  (0.03%)
qwerty    = 71  (0.02%)
12345678  = 57  (0.02%)
123456789 = 57  (0.02%)
111111    = 46  (0.01%)
123123    = 37  (0.01%)
Mykey2012 = 35  (0.01%)
abc123    = 30  (0.01%)
000000    = 27  (0.01%)
trustno1  = 26  (0.01%)
letmein   = 26  (0.01%)
torrent   = 24  (0.01%)
qazwsx    = 24  (0.01%)
Mykey2011 = 23  (0.01%)
1234      = 21  (0.01%)
666666    = 20  (0.01%)
shadow    = 19  (0.01%)
12345     = 19  (0.01%)
1234567   = 19  (0.01%)
1q2w3e4r  = 19  (0.01%)
dragon    = 18  (0.0%)
fuckyou   = 18  (0.0%)

Top 25 Base Words

The website-specific "utorrent" passwords is at the top of the base words list. It's not uncommon to see people include the website name into their password. There are probably thousands of Null Byte readers using the password "nullbyte," "wonderhowto," or some variation.

utorrent = 205 (0.06%)
password = 166 (0.05%)
forum    = 164 (0.05%)
qwerty   = 128 (0.04%)
mykey    = 62  (0.02%)
dragon   = 49  (0.01%)
torrent  = 42  (0.01%)
letmein  = 36  (0.01%)
melto    = 36  (0.01%)
shadow   = 35  (0.01%)
abcd     = 35  (0.01%)
qazwsx   = 32  (0.01%)
monkey   = 31  (0.01%)
trustno  = 31  (0.01%)
fuckyou  = 29  (0.01%)
superman = 28  (0.01%)
pass     = 28  (0.01%)
alex     = 28  (0.01%)
love     = 28  (0.01%)
matrix   = 27  (0.01%)
killer   = 27  (0.01%)
master   = 25  (0.01%)
passw0rd = 25  (0.01%)
nokia    = 24  (0.01%)
welcome  = 24  (0.01%)

Let Me Guess ... Your Password Is '123456'

The most common passwords between 2012 and 2016 are "123456," "password," and "123456789" — with a few variations in how they rank on a site-to-site basis. With that data spanning six years, it's reasonable to believe these are still the most common password used today.

The more interesting data was discovered within the base words and digits appended to each password. These two datasets can be combined to create a much more (statistically) effective password-guessing wordlist.

Until next time, follow me on Twitter @tokyoneon_ and GitHub. And as always, leave a comment below or message me on Twitter if you have any questions.

Just updated your iPhone to iOS 18? You'll find a ton of hot new features for some of your most-used Apple apps. Dive in and see for yourself:

Cover image and screenshots by tokyoneon/Null Byte

3 Comments

How do I obtain breached passwords? I would like to test methods like this on my own accounts. But I don't know where to find them.

Which laptop do you use ??

I know I'm a bit late to the party since I just joined... but, nice post. Kudos to you. lol.

Share Your Thoughts

  • Hot
  • Latest