Skip Menu |

This queue is for tickets about the String-Urandom CPAN distribution.

Report information
The Basics
Id: 56491
Status: resolved
Priority: 0/
Queue: String-Urandom

People
Owner: MBROOKS [...] cpan.org
Requestors: sailorfred [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.13
Fixed in: (no value)



Subject: Documentation does not mention bias for sets of characters not powers of 2
Because the implementation breaks the input into octets, and then mods them by the number of characters in the encoding, a bias is introduced in the distribution of resultant characters. The default settings produces a strong bias towards 'a'..'l'. See http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modulo_bias for more info. Because you have 61 default characters, in a uniform distribution of values in an octet, you have 5 'a'..'l's, and 4 each of the other characters. Here's a quick oneliner that shows the bias: perl -MString::Urandom -e 'my $ur = String::Urandom->new; for my $i (1..100000) { for my $char ( split( //, $ur->rand_string ) ) { print $char, "\n" } }' | sort | uniq -c | sort -nr 62821 f 62747 d 62744 k 62717 b 62661 j 62621 h 62583 a 62433 e 62355 g 62244 i 62104 c 61886 l 50619 y 50440 5 50367 N 50332 6 50329 x 50262 p 50252 w 50237 Y 50223 Q 50199 8 50181 R 50181 3 50167 9 50146 G 50129 v 50103 2 50098 B 50097 o 50082 K 50058 J 50053 q 50046 T 50025 H 50020 V 49999 r 49987 E 49971 S 49969 n 49965 C 49963 1 49962 X 49954 Z 49891 4 49889 s 49878 F 49861 D 49860 z 49851 P 49804 U 49748 L 49744 M 49736 O 49716 7 49696 t 49663 W 49659 I 49601 u 49563 m 49508 A
Fixed by randomizing the character set array using the Fisher-Yates shuffle.