~3mil Passwords Later...

We have all seen blog posts about the most frequent passwords, the most popular, and the worst. But, I figured, I haven’t read a post that actually describes the passwords we are choosing. What do we use? Are they complex? How are they constructed? I realized that over the last few years I collected about ~3 million passwords – I know there are people out there with way more, so if you like to contribute feel free to shoot me an email so I have more data to work with for analysis.

I decided to run ‘pipal’ (great tool: https://github.com/digininja/pipal) over this list of unique passwords (took quite a while). Pipal is a great tool to get most of the interesting statistics I wanted. Below you’ll find the results. Most of them are pretty straight forward, although some a quite interesting.

Total amount of passwords analyzed:

Total entries = 3.276.063
Total unique entries = 3.276.023

Password length (count ordered)

8 = 843068 (25.73%)
9 = 606576 (18.52%)
10 = 549104 (16.76%)
11 = 278862 (8.51%)
7 = 270189 (8.25%)
6 = 263599 (8.05%)
12 = 176808 (5.4%)
13 = 101304 (3.09%)
14 = 63425 (1.94%)
5 = 48913 (1.49%)
15 = 32060 (0.98%)
16 = 23135 (0.71%)
17 = 5704 (0.17%)
18 = 3564 (0.11%)
19 = 2191 (0.07%)
4 = 1957 (0.06%)
3 = 1623 (0.05%)
2 = 1526 (0.05%)
20 = 959 (0.03%)
21 = 514 (0.02%)
22 = 292 (0.01%)
23 = 197 (0.01%)
24 = 128 (0.0%)
1 = 97 (0.0%)
25 = 85 (0.0%)
26 = 45 (0.0%)
27 = 25 (0.0%)
32 = 19 (0.0%)
28 = 16 (0.0%)
40 = 16 (0.0%)
31 = 15 (0.0%)
29 = 13 (0.0%)
30 = 12 (0.0%)
39 = 6 (0.0%)
34 = 4 (0.0%)
33 = 3 (0.0%)
36 = 3 (0.0%)
35 = 2 (0.0%)
64 = 1 (0.0%)
45 = 1 (0.0%)
54 = 1 (0.0%)
43 = 1 (0.0%)

        |                                                               
        |                                                               
        |                                                               
        |                                                               
        ||                                                              
        |||                                                             
        |||                                                             
        |||                                                             
        |||                                                             
        |||                                                             
      ||||||                                                            
      ||||||                                                            
      |||||||                                                           
      |||||||                                                           
      |||||||||                                                         
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||      
000000000011111111112222222222333333333344444444445555555555666666
012345678901234567890123456789012345678901234567890123456789012345

One to six characters = 317715 (9.7%)
One to eight characters = 1430972 (43.68%)
More than eight characters = 1845091 (56.32%)

This surprised me a little bit because I always thought that ~6 is the average amount for passwords.

Alpha, numeric, alpha- and upper-case:

Only lowercase alpha = 443775 (13.55%)
Only uppercase alpha = 317443 (9.69%)
Only alpha = 761218 (23.24%)
Only numeric = 24096 (0.74%)




First capital last symbol = 67278 (2.05%)
First capital last number = 683295 (20.86%)

Months:

january = 64 (0.0%)
february = 38 (0.0%)
march = 630 (0.02%)
april = 720 (0.02%)
may = 6304 (0.19%)
june = 1196 (0.04%)
july = 857 (0.03%)
august = 318 (0.01%)
september = 54 (0.0%)
october = 143 (0.0%)
november = 134 (0.0%)
december = 104 (0.0%)

Compaired to the amount of passwords that we analyzed these values are not very interesting. At first I was a bit confused by the high amount of ‘May’ usage. But this confusion quickly dissipated when checking the ‘pipal’ source code. The code detects the word ‘may’ in the text. ‘May’ is a string that occurs probably a little bit more in common every day words (or names) than the other months. This is why ‘may’ has such a high count.

Days:

monday = 64 (0.0%)
tuesday = 33 (0.0%)
wednesday = 16 (0.0%)
thursday = 17 (0.0%)
friday = 102 (0.0%)
saturday = 22 (0.0%)
sunday = 82 (0.0%)

Years (Top 10)

2011 = 6764 (0.21%)
2010 = 5754 (0.18%)
2008 = 5579 (0.17%)
2007 = 4244 (0.13%)
2009 = 4155 (0.13%)
2006 = 3480 (0.11%)
2000 = 3092 (0.09%)
2005 = 2883 (0.09%)
2012 = 2447 (0.07%)
1980 = 2371 (0.07%)

Colours:

black = 2069 (0.06%)
blue = 4121 (0.13%)
brown = 1047 (0.03%)
gray = 583 (0.02%)
green = 1979 (0.06%)
orange = 489 (0.01%)
pink = 1964 (0.06%)
purple = 477 (0.01%)
red = 11573 (0.35%)
white = 949 (0.03%)
yellow = 451 (0.01%)
violet = 159 (0.0%)
indigo = 90 (0.0%)

Tailing digits:

Single digit on the end = 424798 (12.97%)
Two digits on the end = 505042 (15.42%)
Three digits on the end = 162703 (4.97%)





Last digit
1 = 312509 (9.54%)
3 = 165354 (5.05%)
2 = 164638 (5.03%)
7 = 126409 (3.86%)
9 = 123559 (3.77%)
0 = 122785 (3.75%)
5 = 120416 (3.68%)
4 = 119135 (3.64%)
8 = 115571 (3.53%)
6 = 107765 (3.29%)




Last 2 digits (Top 10)
23 = 42163 (1.29%)
01 = 36592 (1.12%)
11 = 35860 (1.09%)
12 = 32446 (0.99%)
10 = 25681 (0.78%)
00 = 22101 (0.67%)
08 = 21245 (0.65%)
07 = 20770 (0.63%)
13 = 19100 (0.58%)
09 = 18492 (0.56%)




Last 3 digits (Top 10)
123 = 26606 (0.81%)
007 = 7118 (0.22%)
011 = 5834 (0.18%)
234 = 5644 (0.17%)
010 = 5197 (0.16%)
008 = 4983 (0.15%)
000 = 4978 (0.15%)
001 = 4817 (0.15%)
009 = 3846 (0.12%)
006 = 3293 (0.1%)

Last 4 digits (Top 10)

2011 = 4877 (0.15%)
1234 = 4559 (0.14%)
2008 = 4104 (0.13%)
2010 = 4044 (0.12%)
2007 = 3131 (0.1%)
2009 = 2959 (0.09%)
2006 = 2551 (0.08%)
2000 = 2242 (0.07%)
2005 = 2083 (0.06%)
2345 = 1896 (0.06%)

Last 5 digits (Top 10)

12345 = 1692 (0.05%)
23456 = 910 (0.03%)
54321 = 350 (0.01%)
56789 = 277 (0.01%)
55555 = 154 (0.0%)
00000 = 153 (0.0%)
11111 = 150 (0.0%)
67890 = 116 (0.0%)
34567 = 109 (0.0%)
77777 = 101 (0.0%)

Character set ordering

stringdigit: 1052534 (32.13%)
allstring: 840318 (25.65%)
othermask: 585169 (17.86%)
stringdigitstring: 414663 (12.66%)
digitstring: 116446 (3.55%)
stringspecialdigit: 111342 (3.4%)
digitstringdigit: 54196 (1.65%)
stringspecialstring: 51826 (1.58%)
alldigit: 24096 (0.74%)
stringspecial: 19754 (0.6%)
specialstring: 3246 (0.1%)
specialstringspecial: 2039 (0.06%)
allspecial: 434 (0.01%)

Unfortunately the gross of all the passwords are still string based.

I realized after looking at these statistics that the only thing I can say about my data set is the following:

  1. The most common length for a password is 8 characters (25.73%)

  2. 32.13% of the passwords are ‘stringdiget’ based. This is probably caused by a password policy. I don’t think this is a decision by the user.

  3. Only ~5% uses a password that is “acceptable” for most security policies. This means that 95% is likely to choose a password that can be considered “insecure”!

That’s it for now. Again, do you have a .pot file, and you feel like sharing it with me please do .pot files always make me happy ;)

Cheers, Ruben.

Comments