Subject: | Distribution for small bits is non-uniform, and can return out-of-range values |
The random number selection for small bits in maurer is non-uniform and
can select numbers outside the range. The method of selecting a random
number and calling nextprime generates very skewed distributions.
See Pierre-Alain Fouque and Mehdi Tibouchi, "Close to Uniform Prime
Number Generation With Fewer Random Bits", 2011.
[http://eprint.iacr.org/2011/481] for a discussion of distributions and
a comparison of the 'PRIMEINC' method compared to other methods
including the 'TRIVIAL' method used by Maurer for k <= p0 generation.
perl -MCrypt::Primes=maurer -E 'my %freq; $n=1000000;
$freq{maurer(Size=>6)}++ for (1..$n); printf("%4d %6.3f%%\n", $_,
100.0*$freq{$_}/$n) for sort {$a<=>$b} keys %freq;'
37 18.714%
41 12.509%
43 6.293%
47 12.391%
53 18.739%
59 18.804%
61 6.273%
67 6.278%
The frequency of selection is directly related to the size of the
primegap before it, which is what happens with the nextprime(rand)
method (Fouque and Tibouchi call this 'PRIMEINC', since that's what
Brandt and Damgård's 1992 paper call it). Also, 67 isn't a 6-bit number.
For comparison:
perl -MMath::Prime::Util=random_maurer_prime -E 'my %freq; $n=1000000;
$freq{random_maurer_prime(6)}++ for (1..$n); printf("%4d %6.3f%%\n", $_,
100.0*$freq{$_}/$n) for sort {$a<=>$b} keys %freq;'
37 14.329%
41 14.268%
43 14.311%
47 14.308%
53 14.277%
59 14.274%
61 14.233%
Brandt and Damgård claim in their 1992 paper that the entropy tends
toward the ideal as the number of bits k increases. However at these
small sizes the distribution is very far from uniform, and I am more
swayed by Fouque and Tibouchi's arguments about uniformity. Given that
we're using this purely for selecting small k values, I think we should
select the results uniformly.
Here are two ways of generating a uniform distribution (not exhaustive):
1) a) Map the primes in the range to consecutive integers.
b) Uniformly select one of the integers (now very easy).
c) Map the selected number back to a prime.
Simple implementation:
a) my @primes = primes(low,high)
b) uniformly select one of the elements of @primes
c) return it
Alternate implementation:
a) set pilow = prime_count( next_prime( low-1 ) )
pihigh = prime_count( prev_prime( high+1 ) )
b) select a uniform random number n, pilow <= n <= pihigh.
c) return nth_prime(n).
Advantages of this method: we are clearly uniformly selecting a prime in
the range, there is no looping, it consumes fewer random bits than any
other method, and the operations are fast for small values. In my
testing this is faster than the next method until ~15-17 bits, though it
depends on the relative speed of the functions. The first
implementation shown is pretty easy, but it starts using impractical
amounts of memory before long. The alternate implementation gets around
the memory use by not actually generating all the primes in the range.
These functions, while low memory, become impractically slow for large
values.
Given that Crypt::Primes has a pre-built array of primes < 16-bit, it
wouldn't be a big stretch to use this method. However 20 bits worth of
primes would be 82025 instead of 6542, which isn't a good use of space
(if using the big array of ints).
2) Select a random number in the range. Check primality.
Return it if prime, otherwise repeat the process.
Maurer uses this method in his paper. Fouque and Tibouchi (2011) call
this the "TRIVIAL" method, giving a uniform distribution at the expense
of a fair amount of random bits consumed and more primality tests than
ideal. There are plenty of optimizations possible, such as looking at
only odd values (mapping 1 to 2 so it is included). There are also
various ways to speed this up (and importantly, use fewer random bits)
for giant ranges, at the expense of a small loss of uniformity (see
Fouque and Tibouchi 2011 for a couple ways, though be careful of Joye
and Paillier's methods as they are patented). Math::Prime::Util uses
something not too dissimilar from "Algorithm 1" that works with
arbitrary ranges.
For the simple up-to-20-bit case needed here, none of these
complications and optimizations matter, and just selecting a random odd
number in the range and checking primality works fine.
On slightly related note, I looked at the number of distinct primes
output. I changed both Crypt::Primes and Math::Prime::Util to use p0 =
12 and m = 20, then streamed out 20-bit primes. Of the 38635 primes,
Maurer indicates we should expect about 10% of them. I then looked at
the number of distinct primes output after various output points.
(1) I removed non-primes from Crypt::Primes output.
(2) Since this is random, reproducing will not give the exact
numbers, but should be similar.
(3) the second column is using random_nbit_prime, which uses a
uniform selection method ('TRIVIAL') hence should output all
possible primes. The primality test used is deterministic for
all bit sizes <= 64.
N # prime nbit # primes MPU # primes CP
----- ------------ ------------ ------------
10k 8859 3320 2662
20k 15580 3567 3028
30k 20811 3594 3145
40k 24876 3597 3189
50k 28014 3597 3215
100k 35657 3597 3236
200k 38403 3597 3237
400k 38635 3597 3237
800k 38635 3597 3237
What I see is that even after 1.8 million numbers generated, large
chunks of possible primes are just not being output. The largest one is
1042631, for instance, while at least 36 of the 423 20-bit primes larger
than that should be output.
Now on the the distribution of these primes, which are constructed from
the skewed base distribution. This uses the same data as above, looking
at the frequency of 1M 20-bit primes output using maurer with p0=12, so
we go through at least one iteration of Maurer's algorithm.
Math::Prime::Util's random_maurer_prime, 3597 total numbers, we expect
the probability for any given one to be 0.0278%. Max: 0.0645%, Min:
0.0145%. The number 790817 turned up 4.5 times more often than the
number 817679. Total numbers that vary by more than 2x the average: 16
(0.44%).
Crypt::Primes' maurer, 3432 unique numbers (195 composites), we expect
the probability for any given one to be 0.0291%. Max: 0.1826%, Min:
0.0001%. The number 615449 turned up 1826 times more often than the
number 1000567. Total numbers that vary by more than 2x the average:
1491 (43%).