Bug #81871 for Crypt-Primes: Distribution for small bits is non-uniform, and can return out-of-range values

Mon Dec 10 19:06:29 2012 DANAJ [...] cpan.org - Ticket created

Subject:

Distribution for small bits is non-uniform, and can return out-of-range values

The random number selection for small bits in maurer is non-uniform and can select numbers outside the range. The method of selecting a random number and calling nextprime generates very skewed distributions. See Pierre-Alain Fouque and Mehdi Tibouchi, "Close to Uniform Prime Number Generation With Fewer Random Bits", 2011. [http://eprint.iacr.org/2011/481] for a discussion of distributions and a comparison of the 'PRIMEINC' method compared to other methods including the 'TRIVIAL' method used by Maurer for k <= p0 generation. perl -MCrypt::Primes=maurer -E 'my %freq; $n=1000000; $freq{maurer(Size=>6)}++ for (1..$n); printf("%4d %6.3f%%\n", $_, 100.0*$freq{$_}/$n) for sort {$a<=>$b} keys %freq;' 37 18.714% 41 12.509% 43 6.293% 47 12.391% 53 18.739% 59 18.804% 61 6.273% 67 6.278% The frequency of selection is directly related to the size of the primegap before it, which is what happens with the nextprime(rand) method (Fouque and Tibouchi call this 'PRIMEINC', since that's what Brandt and Damgård's 1992 paper call it). Also, 67 isn't a 6-bit number. For comparison: perl -MMath::Prime::Util=random_maurer_prime -E 'my %freq; $n=1000000; $freq{random_maurer_prime(6)}++ for (1..$n); printf("%4d %6.3f%%\n", $_, 100.0*$freq{$_}/$n) for sort {$a<=>$b} keys %freq;' 37 14.329% 41 14.268% 43 14.311% 47 14.308% 53 14.277% 59 14.274% 61 14.233% Brandt and Damgård claim in their 1992 paper that the entropy tends toward the ideal as the number of bits k increases. However at these small sizes the distribution is very far from uniform, and I am more swayed by Fouque and Tibouchi's arguments about uniformity. Given that we're using this purely for selecting small k values, I think we should select the results uniformly. Here are two ways of generating a uniform distribution (not exhaustive): 1) a) Map the primes in the range to consecutive integers. b) Uniformly select one of the integers (now very easy). c) Map the selected number back to a prime. Simple implementation: a) my @primes = primes(low,high) b) uniformly select one of the elements of @primes c) return it Alternate implementation: a) set pilow = prime_count( next_prime( low-1 ) ) pihigh = prime_count( prev_prime( high+1 ) ) b) select a uniform random number n, pilow <= n <= pihigh. c) return nth_prime(n). Advantages of this method: we are clearly uniformly selecting a prime in the range, there is no looping, it consumes fewer random bits than any other method, and the operations are fast for small values. In my testing this is faster than the next method until ~15-17 bits, though it depends on the relative speed of the functions. The first implementation shown is pretty easy, but it starts using impractical amounts of memory before long. The alternate implementation gets around the memory use by not actually generating all the primes in the range. These functions, while low memory, become impractically slow for large values. Given that Crypt::Primes has a pre-built array of primes < 16-bit, it wouldn't be a big stretch to use this method. However 20 bits worth of primes would be 82025 instead of 6542, which isn't a good use of space (if using the big array of ints). 2) Select a random number in the range. Check primality. Return it if prime, otherwise repeat the process. Maurer uses this method in his paper. Fouque and Tibouchi (2011) call this the "TRIVIAL" method, giving a uniform distribution at the expense of a fair amount of random bits consumed and more primality tests than ideal. There are plenty of optimizations possible, such as looking at only odd values (mapping 1 to 2 so it is included). There are also various ways to speed this up (and importantly, use fewer random bits) for giant ranges, at the expense of a small loss of uniformity (see Fouque and Tibouchi 2011 for a couple ways, though be careful of Joye and Paillier's methods as they are patented). Math::Prime::Util uses something not too dissimilar from "Algorithm 1" that works with arbitrary ranges. For the simple up-to-20-bit case needed here, none of these complications and optimizations matter, and just selecting a random odd number in the range and checking primality works fine. On slightly related note, I looked at the number of distinct primes output. I changed both Crypt::Primes and Math::Prime::Util to use p0 = 12 and m = 20, then streamed out 20-bit primes. Of the 38635 primes, Maurer indicates we should expect about 10% of them. I then looked at the number of distinct primes output after various output points. (1) I removed non-primes from Crypt::Primes output. (2) Since this is random, reproducing will not give the exact numbers, but should be similar. (3) the second column is using random_nbit_prime, which uses a uniform selection method ('TRIVIAL') hence should output all possible primes. The primality test used is deterministic for all bit sizes <= 64. N # prime nbit # primes MPU # primes CP ----- ------------ ------------ ------------ 10k 8859 3320 2662 20k 15580 3567 3028 30k 20811 3594 3145 40k 24876 3597 3189 50k 28014 3597 3215 100k 35657 3597 3236 200k 38403 3597 3237 400k 38635 3597 3237 800k 38635 3597 3237 What I see is that even after 1.8 million numbers generated, large chunks of possible primes are just not being output. The largest one is 1042631, for instance, while at least 36 of the 423 20-bit primes larger than that should be output. Now on the the distribution of these primes, which are constructed from the skewed base distribution. This uses the same data as above, looking at the frequency of 1M 20-bit primes output using maurer with p0=12, so we go through at least one iteration of Maurer's algorithm. Math::Prime::Util's random_maurer_prime, 3597 total numbers, we expect the probability for any given one to be 0.0278%. Max: 0.0645%, Min: 0.0145%. The number 790817 turned up 4.5 times more often than the number 817679. Total numbers that vary by more than 2x the average: 16 (0.44%). Crypt::Primes' maurer, 3432 unique numbers (195 composites), we expect the probability for any given one to be 0.0291%. Max: 0.1826%, Min: 0.0001%. The number 615449 turned up 1826 times more often than the number 1000567. Total numbers that vary by more than 2x the average: 1491 (43%).

Wed Dec 12 00:32:58 2012 DANAJ [...] cpan.org - Correspondence added

Attached is a patch that fixes this. Output for the simple 6-bit test: perl -Iblib/lib -Iblib/arch -MCrypt::Primes=maurer -E 'my %freq; $n=10000; $freq{maurer(Size=>6)}++ for (1..$n); printf("%4d %6.3f%%\n", $_, 100.0*$freq{$_}/$n) for sort {$a<=>$b} keys %freq;' 37 14.030% 41 14.210% 43 14.740% 47 14.060% 53 14.450% 59 14.160% 61 14.350% This fixes the problem of numbers being output that are larger than the bit size requested, as well as giving a uniform distribution. The downside is that we consume more randomness, and Crypt::Random can be really slow even with something like haveged running. Speed for 20-bit primes is now 2-3x slower, though this is a negligible fraction of the time spent for large prime generation so should have basically no effect on timing for 512-bit and larger primes. As expected, the number of unique primes seen after N trials rises much more rapidly, however it tops out at a slightly lower point (3229 unique 20-bit primes vs. 3237 from before). These were probably the ones generated with q values outside the requested bit range. I'm returning undef for k < 2, as there are no less-than-2-bit primes. The old code returned 2. I'm using a simple optimization to only generate odd numbers for bit sizes greater than 2. --- lib/Crypt/Primes-orig.pm 2003-01-16 12:08:32.000000000 -0800 +++ lib/Crypt/Primes.pm 2012-12-11 21:24:46.548779934 -0800 @@ -572,8 +54,13 @@ $| = 1; if ( $k <= $p0 ) { - my $spr = makerandom Size => $k; - return Math::Pari::nextprime ( $spr ); + return if $k < 2; + return makerandom( Size => 2 ) if $k == 2; + my $spr; + do { + $spr = 2 * makerandom( Size => $k-1 ) + 1; + } while (!trialdiv($spr)); + return $spr; } my $c_opt = 0.09; my $m = 20;

Wed Dec 12 00:32:59 2012 DANAJ [...] cpan.org - Status changed from 'new' to 'open'

Wed Dec 12 01:18:46 2012 DANAJ [...] cpan.org - Correspondence added

It looks like the prime diversity is being impacted by the constant added to the q generation. I'm attaching a note about it and a patch in RT 81858.