CC: | hv [...] crypt.org, sisyphus [...] cpan.org |
Subject: | Math::GMPz / Math::GMP / Math::BigInt output |
Currently Math::Prime::Util:GMP does all I/O via strings. While this is easy and very portable, it is quite time consuming for large inputs, and can add significant overhead for some functions even with small inputs.
For an example, this takes about 6 minutes currently:
$perl -E 'use ntheory ":all"; my $n = primorial(230077)/2229464046810-3131794; @f = Math::Prime::Util::GMP::sieve_primes($n+1, $n+4680156-1, 1e9); say $_-$n for @f;'
but the actual *work* is about 1 minute 45 seconds. Changing sieve_primes to return the offsets directly gives that fast time, as does a change to return the full numbers as Math::GMPz objects (assuming $n is a Math::GMPz object as well).
I have code to do this, it needs to be turned on in all functions and tested. Principles (please add input):
(1) If the input is a Math::GMPz or Math::GMP object, the mpz_t is immediately available and is used. Otherwise it is read as a UV/IV if it fits in both UV and long, or string otherwise. I briefly tried getting Math::BigInt::GMP input but it's complicated.
(2) the return type is always identical to the input if they were Math::GMPz or Math::GMP (multi-argument functions will just pick one of the arguments to be the deciding one).
Now we have debatable points:
(a) if the input(s) were some other object (e.g. Math::Pari or Math::BigInt) should we insist that the inputs we that same object type? This seems nice for the user. Making Math::GMP and Math::GMPz objects for output is very simple and fast, but other types not so much (we are forced to call their new() methods for each return value).
(b) since we obviously have GMP, I'm thinking I could add a dependency on Math::GMPz and make all input returned as that type if not a Math::GMP. This could cause havoc if someone was using Math::Pari or Math::BigInt though, as they don't always play well together.
(c) I could add a config option 'class' that takes values 'string''Math::GMP','Math::GMPz', maybe others. Setting it would force all output to that type. 'string' can be useful if the caller is always doing string manipulation on the output (lots of OEIS sequences do this sort of thing).