Subject: | Documentation does not mention bias for sets of characters not powers of 2 |
Because the implementation breaks the input into octets, and then mods
them by the number of characters in the encoding, a bias is introduced
in the distribution of resultant characters.
The default settings produces a strong bias towards 'a'..'l'.
See
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modulo_bias
for more info.
Because you have 61 default characters, in a uniform distribution of
values in an octet, you have 5 'a'..'l's, and 4 each of the other
characters.
Here's a quick oneliner that shows the bias:
perl -MString::Urandom -e 'my $ur = String::Urandom->new; for my $i
(1..100000) { for my $char ( split( //, $ur->rand_string ) ) { print
$char, "\n" } }' | sort | uniq -c | sort -nr
62821 f
62747 d
62744 k
62717 b
62661 j
62621 h
62583 a
62433 e
62355 g
62244 i
62104 c
61886 l
50619 y
50440 5
50367 N
50332 6
50329 x
50262 p
50252 w
50237 Y
50223 Q
50199 8
50181 R
50181 3
50167 9
50146 G
50129 v
50103 2
50098 B
50097 o
50082 K
50058 J
50053 q
50046 T
50025 H
50020 V
49999 r
49987 E
49971 S
49969 n
49965 C
49963 1
49962 X
49954 Z
49891 4
49889 s
49878 F
49861 D
49860 z
49851 P
49804 U
49748 L
49744 M
49736 O
49716 7
49696 t
49663 W
49659 I
49601 u
49563 m
49508 A