Skip Menu |

This queue is for tickets about the Math-Pari CPAN distribution.

Report information
The Basics
Id: 93652
Status: open
Priority: 0/
Queue: Math-Pari

People
Owner: Nobody in particular
Requestors: cpan [...] zoffix.com
Cc: DANAJ [...] cpan.org
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.01080605
Fixed in: (no value)



Subject: isprime lies
Hey, Not sure if this was reported already, but I just saw in the logs someone show this bug on IRC. I was able to reproduce it on my box with Math::Pari version 2.01080605 and perl version 5.14.2 (i686-linux-gnu-thread-multi-64int) <danaj> This amuses me, especially with all the security bugs elsewhere. perl -MMath::Pari=isprime -E 'while (1) { die "22027*44053 is prime? Cool story, bro!" if isprime(22027*44053); }' <danaj> Doesn't work if you have compiled Math::Pari to use Pari 2.3.5, but that's not the default. -- Cheers, ZZ [ https://metacpan.org/author/ZOFFIX ]
Summary: Caused by Pari 2.1.7 using a weak probable prime test. Later versions of Pari use a much stronger test. See http://oeis.org/A141768 for more examples of "easy" numbers. The example I gave is one of the easiest with Pari 2.1.7's RNG, as it fails after only 799 attempts if run first thing. "9" is another funny example. Note that any number could be wrong, it's just the ones from the list will fail more often. This is a consequence of Pari 2.1.7 giving only two options for isprime: (1) run 10 M-R tests with randomly chosen bases (using its own RNG that is seeded identically each startup), or (2) run a BLS5 proof, which works great for relatively small numbers, but isn't workable for 100+ digit numbers (it requires factoring n-1). Pari 2.3 fixed this by following the lead of every other math package and using the BPSW test, which is not only faster but has no known counterexample after 34 years of searching. They do exist, but we just haven't found one (there are good reasons why we expect them to be very rare). This is the same problem I've harped on Perl 6, though there is a solution there since we control the source code so can fairly easily add a Lucas test and then change the isprime function to do what we'd like without any user visible change (other than fixing the known examples it gets wrong). GMP's mpz_probab_prime_p has similar issues, with known examples that fail for 16 tests. Like Perl 6 there is no re-randomization, so the known examples will fail every call. Why is this important? Because currently there are CPAN Crypto modules using Math::Pari, and isprime is expected to work well. There are replacements for most (all?) of the modules, but they're not known or used in most cases (other than Strawberry Perl which can't use Math::Pari). They may also have replaced the existing defects with new unknown defects. How do we fix? Have Math::Pari use a newer version of Pari. This is not easy. Getting Pari 2.3.5 to work on some machines is easy, and on many others it requires a *lot* of work.
Subject: Re: [rt.cpan.org #93652] isprime lies
Date: Sat, 8 Mar 2014 15:41:50 -0800
To: Dana Jacobsen via RT <bug-Math-Pari [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Sat, Mar 08, 2014 at 05:07:55PM -0500, Dana Jacobsen via RT wrote: Show quoted text
> Queue: Math-Pari > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=93652 > > > Summary: Caused by Pari 2.1.7 using a weak probable prime test. Later versions of Pari use a much stronger test. > > > See http://oeis.org/A141768 for more examples of "easy" numbers. The example I gave is one of the easiest with Pari 2.1.7's RNG, as it fails after only 799 attempts if run first thing. "9" is another funny example. Note that any number could be wrong, it's just the ones from the list will fail more often. > > This is a consequence of Pari 2.1.7 giving only two options for isprime: (1) run 10 M-R tests with randomly chosen bases (using its own RNG that is seeded identically each startup), or (2) run a BLS5 proof, which works great for relatively small numbers, but isn't workable for 100+ digit numbers (it requires factoring n-1).
What is the expected number of rounds to make the probability below 10^-20? And what is “startup”, “PARI startup”? Then Math::Pari/the-user can call is_prime() several times to circumvent this? Show quoted text
> How do we fix? Have Math::Pari use a newer version of Pari. This is not easy. Getting Pari 2.3.5 to work on some machines is easy, and on many others it requires a *lot* of work.
Did not hear about this. Oups, I see now: http://www.cpantesters.org/cpan/report/8590d4ea-89ed-11e3-a768-862a330ec7e2 Do not know when I will be able to work around this… Thanks, Ilya
On Sat Mar 08 18:42:03 2014, nospam-abuse@ilyaz.org wrote: Show quoted text
> What is the expected number of rounds to make the probability below > 10^-20?
Worst case numbers have a probability of almost 25% of passing a given test (they are at *most* 25% via Monier 1980, Rabin 1980). So 0.25^10 is a bit less than 10^-7. You'd need 34 tests to stay under 10^-20. It's a lot more complicated if trying to get the probability of random inputs rather than worst case (see Damgard,Landrock,Pomerance 1993 for example). NIST uses the random input probability in FIPS 186-4, as the p and q are not generated by an adversary. They end up with 40 tests for 2^-80 at a 1024-bit size (they give many values based on the input sizes and desired error level). Show quoted text
> And what is “startup”, “PARI startup”? Then Math::Pari/the-user can > call is_prime() several times to circumvent this?
Good point that a workaround would be to call isprime more than once, which will (unlike GMP or libtommath) use different bases each time. You get 10 new tests each time, so can run multiple times. By startup I meant that Pari 2.1.7's random number generator is seeded with the value "1" and then follows a simple LRG (see mymyrand in bibli2.c). Hence you get the same sequence each time you start Pari. These numbers will fail regardless if you call it enough times (it just needs to hit one of the 24.99% of possible bad bases for the number 10 times in a row). Because I know the sequence, I can find some that will fail after a small number of calls assuming it was the only action, e.g. the initial example that takes only 799 calls to see. 199*397 takes 1868 calls. If you ignore that failure and keep calling it, the next one happens after 239k more calls. Then another 450k calls. This is just demonstrating that we're only running 10 M-R tests. Show quoted text
> > How do we fix? Have Math::Pari use a newer version of Pari. This is > > not easy. Getting Pari 2.3.5 to work on some machines is easy, and > > on many others it requires a *lot* of work.
> > Did not hear about this. Oups, I see now: > http://www.cpantesters.org/cpan/report/8590d4ea-89ed-11e3-a768- > 862a330ec7e2 > Do not know when I will be able to work around this…
Thanks for the reply. It's been around for a long time, and most people don't really care. I think just having the note about it and indicating one can call isprime multiple times should cover it. Oh with Pari 2.3.5 the API changed: "ispseudoprime" does the BPSW test (better and faster probable prime test) while "isprime" turns into a primality proof (much slower, albeit they use APR-CL which is quite fast as proofs go).
Subject: Re: [rt.cpan.org #93652] isprime lies
Date: Fri, 25 Apr 2014 01:39:07 -0700
To: Dana Jacobsen via RT <bug-Math-Pari [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Sun, Mar 09, 2014 at 05:48:22AM -0400, Dana Jacobsen via RT wrote: Show quoted text
> > What is the expected number of rounds to make the probability below > > 10^-20?
Show quoted text
> Worst case numbers have a probability of almost 25% of passing a given test (they are at *most* 25% via Monier 1980, Rabin 1980). So 0.25^10 is a bit less than 10^-7. You'd need 34 tests to stay under 10^-20. It's a lot more complicated if trying to get the probability of random inputs rather than worst case (see Damgard,Landrock,Pomerance 1993 for example). NIST uses the random input probability in FIPS 186-4, as the p and q are not generated by an adversary. They end up with 40 tests for 2^-80 at a 1024-bit size (they give many values based on the input sizes and desired error level).
Actually, I made a quick search, and could not find what you said as stated explicitely anywhere I looked! Essentially, you say that there is an analogue of Carmichael numbers for the R-M algorithm, but they have not “exactly the worst possible case (25%) of residues being non-witnesses”, but “close to 25%”. They do not even write that the numbers in http://oeis.org/A141768 have close to 25% non-witnesses… Is it just my inability to Google? Do you know anything more specific about these “worst case numbers”? Thanks, Ilya P.S. I put some random musing into the BUGS part of the latest version (with 0.75 misprinted as 0.25 :-[). Page up from http://search.cpan.org/dist/Math-Pari/Pari.pm#INITIALIZATION
Apologies for this long, disorganized answer. I do not have an unconditional answer, and I'm not sure if I'm answering the right question. http://oeis.org/A090659 may be of more interest. Note CRG4's comments about not knowing whether this sequence is infinite or not (which is the cause for a lot of my wandering discussion below). Monier (1980) and Rabin (1980) both show the upper bound of 1/4. E.g. Monier 1980: "Proposition 2. The failure probability alpha_n of Miller-Rabin's test is less than 1/4 for any composite n. One remarks that we cannot presently give an upper bound lower than 1/4 since we do not know whether the set of numbers approaching this value is finite or not." AGP 1994 notes "A slightly stronger version of the result of Monier [M] and Rabin [R] mentioned in the introduction is that if n > 9 is an odd composite, then at least 3/4 of the integers in [1,n] are witnesses for n." Also see the last paragraph of the accuracy section on Wikipedia: http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Accuracy_of_the_test. While not a primary reference, this does not show any new results contrary to the 1/4 limit, and also indicates the general point that on average the false acceptance probability goes down rapidly with the size of the input, but adversarial situations should assume the inputs may be near worst case numbers. More discussion: I do not have a proof that there are infinitely many numbers within a given epsilon of 25%. However, some anecdotal ways of seeing these. First, from the A141768 sequence: 1. wget http://oeis.org/A141768/b141768.txt 2. perl -MMath::Prime::Util=:all -nE '($n) = /^\d+ (\d+)/; ($lb,$t)=($n-1,0); for (2..$lb) { $t += is_strong_pseudoprime($n,$_); } say "$n ",100*$t/($n-1);' b141768.txt A similar method can be done with Math::Pari, using something like Hasler's is_A001262() function in A001262. This is brute forcing the search -- the method in A141768 may be faster. This may be easier to see by using A090659 instead, as it by definition will give a monotonically increasing sequence. If the sequence is infinite (conjectured but not proven), then I think we could make a precise statement such as "For any given epsilon > 0, there exists a c_0 such that there are infinitely many composites n > c_0 having at least (n-1) * (1/4 - epsilon) non-witnesses." Monier (1980) breaks the composite n's into three categories. The last are non-Carmichael numbers, with the maximum P(n) occurring with two factors of form (2p+1)(4p+1) for odd p. We can do a similar experiment: perl -MMath::Prime::Util=:all -E 'forprimes { $p=($_-1)>>1; ($p1,$p2) = ($_,2*($_-1)+1); if ($p & 1 && is_prime($p2)) { $n = $_*$p2; ($lb,$t)=($n-1,0); for (2..$lb) { $t += is_strong_pseudoprime($n,$_); } say "$n ",100*$t/($n-1); } } 9,10**12;' and see a similar pattern, with monotonically increasing percentages (note: Higgens conjecture 2 shows it is monotonic, though just a conjecture). Monier notes it is an open question as to whether this sequence is infinite or not. For one of the other cases, Carmichael numbers, there are infinitely many Carmichael numbers (AGP 1994), but I am not sure that this includes the subset we care about, with exactly three prime factors each congruent to 3 mod 4. See section 3 of AGP 1994. Rabin (1980) discusses these numbers on page 133, but mostly noting that they exist hence reducing the constant 3/4 seems not to be possible. I did not see discussion of frequencies or distributions of these numbers. The papers by Kim and Pomerance (1989) and Damgård, Landrock, and Pomerance (1993) may have more information. They are looking at the average probability of accepting a composite number, which is what we care about in a non-adversarial situation, and why for most cases (i.e. where the user isn't actively trying to find faults) the isprime function works pretty well. According to the text in A141768, the sequence is infinite, but that would be true with the much weaker proposition that the absolute number of bases increases, allowing a decreasing percentage. I am unable to parse the AGP 1994 paper well enough to see if there is more information. Note that this follows from the comments in the sequences A141768 and A090659 -- the former is known to be infinite, while the latter is implied by Dickson's conjecture, but not proven. Maybe I'm not understanding the question. As far as I know, the limit of 3/4 is the best limit known. Are you thinking that numbers requiring near that limit do not exist? [Rabin 1980 says they do, as do the programs above] That they are not infinite? [still an open question]. That "close to" is not precise enough? [granted, though we can conjecture a precise statement] Thanks, Dana
Subject: Re: [rt.cpan.org #93652] isprime lies
Date: Fri, 25 Apr 2014 16:45:28 -0700
To: Dana Jacobsen via RT <bug-Math-Pari [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Fri, Apr 25, 2014 at 06:03:03PM -0400, Dana Jacobsen via RT wrote: Show quoted text
> Maybe I'm not understanding the question. As far as I know, the limit of 3/4 is the best limit known. Are you thinking that numbers requiring near that limit do not exist? [Rabin 1980 says they do, as do the programs above] That they are not infinite? [still an open question]. That "close to" is not precise enough? [granted, though we can conjecture a precise statement]
Thank you a lot, you completely filled the holes in what I knew! Can you vet the current variant for obvious errors? Thanks, Ilya ======================================================= In older versions of PARI, the one-argument variant of the function isprime() is actually checking for probable primes. Moreover, it has certain problems. B<POSSIBLE WORKAROUND (not needed for newer PARI):> before version 2.3 of PARI, to get probability of misdetecting a prime below 1e-12, call isprime() twice; below 1e-18, call it 3 times; etc. (The algorithm is probabilistic, and the implementation is such that the result depends on which calls to isprime() were performed ealier.) The problems: first, while the default algorithm (before version 2.3) gives practically acceptable results in non-adversarial situations, the worst-case behaviour is significantly worse than the average behaviour. The algorithm is looking for so-called "witnesses" (with up to 10 tries) among random integers; usually, witnesses are abundant. However, there are non-prime numbers for which the fraction of witnesses is close to the theoretical mininum, 0.75; with 10 random tries, the probability of missing a witness for such numbers is close to 1e-6. (The known worst-case numbers M have phi(M)/4 non-witnesses, with M=P(2P-1), prime P, 2P-1 and 4|P+1; the proportion of such numbers near K is expected to be const/sqrt(K)log(K)^2. Note that numbers which have more than about 5% non-witnesses may also be candidates for false positives. Conjecturally, they are of the form (aD+1)(bD+1) with a<b, ab <= const, prime aD+1, and bD+1, and D not divisible by high power of 2 (above a=1, b=2 and D is odd); the proportion of such numbers may have a similar asymptotic const/sqrt(K)log(K)^2.) Second, the random number generator is "reset to known state" when PARI library is initialized. That means that the behaviour is actually predictable if one knows which calls to isprime() are performed; an adversary can find non-primes M which will trigger a false positive exactly on the Nth call to isprime(M) (for particular values of N). With enough computing resources, one can find non-primes M for which N is relatively small (with M about 1e9, one can achieve N as low as 1000). Compare with similar examples for simpler algorithm, L<Carmichael numbers|http://primes.utm.edu/glossary/xpage/CarmichaelNumber.html>; see also L<numbers with big proportion of non-witnesses|http://oeis.org/A090659> and L<numbers with many non-witnesses|http://oeis.org/A141768>, and L<the conjecture about proportion|http://web.archive.org/web/*/http://www.ma.iup.edu/MAA/proceedings/vol1/higgins.pdf>.