Bug #56491 for String-Urandom: Documentation does not mention bias for sets of characters not powers of 2

Sun Apr 11 22:15:43 2010 sailorfred [...] yahoo.com - Ticket created

Subject:

Documentation does not mention bias for sets of characters not powers of 2

Because the implementation breaks the input into octets, and then mods them by the number of characters in the encoding, a bias is introduced in the distribution of resultant characters. The default settings produces a strong bias towards 'a'..'l'. See http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modulo_bias for more info. Because you have 61 default characters, in a uniform distribution of values in an octet, you have 5 'a'..'l's, and 4 each of the other characters. Here's a quick oneliner that shows the bias: perl -MString::Urandom -e 'my $ur = String::Urandom->new; for my $i (1..100000) { for my $char ( split( //, $ur->rand_string ) ) { print $char, "\n" } }' | sort | uniq -c | sort -nr 62821 f 62747 d 62744 k 62717 b 62661 j 62621 h 62583 a 62433 e 62355 g 62244 i 62104 c 61886 l 50619 y 50440 5 50367 N 50332 6 50329 x 50262 p 50252 w 50237 Y 50223 Q 50199 8 50181 R 50181 3 50167 9 50146 G 50129 v 50103 2 50098 B 50097 o 50082 K 50058 J 50053 q 50046 T 50025 H 50020 V 49999 r 49987 E 49971 S 49969 n 49965 C 49963 1 49962 X 49954 Z 49891 4 49889 s 49878 F 49861 D 49860 z 49851 P 49804 U 49748 L 49744 M 49736 O 49716 7 49696 t 49663 W 49659 I 49601 u 49563 m 49508 A

Tue May 03 02:34:27 2011 MBROOKS [...] cpan.org - Correspondence added

Fixed by randomizing the character set array using the Fisher-Yates shuffle.

Tue May 03 02:34:29 2011 The RT System itself - Status changed from 'new' to 'open'

Tue May 03 02:34:29 2011 MBROOKS [...] cpan.org - Status changed from 'open' to 'resolved'

Tue May 03 02:34:29 2011 MBROOKS [...] cpan.org - Given to MBROOKS