(info on the sieve itself at the end, which ends up being a far bigger
improvement)
The best estimate would be using the Riemann R function, but that takes
some effort to calculate. The LogarithmicIntegral gives a pretty good
approximation, and the convergent series isn't too bad for the values
we're looking at. Here are some numbers I got:
# Method 10^10 %error 10^19 %error
# ----------------- ------------ ------------
# average bounds .01% .0002%
# li(n) .0007% .00000004%
# li(n)-li(n^.5)/2 .0004% .00000001%
# R(n) .0004% .00000001%
All of these are a fair bit of effort just to save the stack expansion.
You probably want an upper bound, e.g. (assuming n = number and logn =
log(n)):
Chebyshev, x >= 17:
upper = 1.25506 * n / logn;
Rosser and Schoenfeld, x >= 67:
upper = n / (logn - 1.5);
Dusart 1999, x >= 355991
upper = n/logn * (1 + 1/logn + 2.510/(logn*logn))
Dusart 2010, x >= 2_953_652_287
upper = n/logn * (1 + 1/logn + 2.334/(logn*logn))
The Dusart bounds are far, far better than the others. You can also
tweak the numbers to get tighter bounds for certain ranges.
I tried this, and it does seem to save a little time (6.4s to 6.3s for
200M primes), assuming you change to PUSH from XPUSH -- that is, we are
using a _real_ upper bound, which means no stack checking. If you're
going to check the stack every time, then (1) use a lower bound, and (2)
I doubt it will save any time.
You'd get a bigger savings in time and space by returning an array
reference, but it's too late to change the API, and I think it looks
weird to use array references all the time.
All of this pales in comparison to rewriting the actual sieve (the two
are independent of course). For example:
unsigned char *composite = NULL;
Newxz (composite, number/16 + 1, unsigned char);
if (base > number) XSRETURN_EMPTY;
for (n = 3; n <= square_root; n += 2)
if ((composite[n/16] & (1 << ((n/2)%8))) == 0)
for (i = n*n; i <= number; i += 2*n)
composite[i/16] |= 1 << ((i/2)%8);
if (base <= 2 && number >= 2)
XPUSHs (sv_2mortal(newSVuv(2)));
base |= 1; /* make odd */
for (n = base; n <= number; n += 2)
if ((composite[n/16] & (1 << ((n/2)%8))) == 0)
XPUSHs (sv_2mortal(newSVuv(n)));
Safefree (composite);
as xs_sieve_primes runs 2-3x faster than the current code. That's
basically the code Math::Prime::FastSieve uses, which is the canonical
SoE. Using a mod-30 wheel would make it use even less memory and run 2x
faster again. Segmenting would help control memory use for big ranges.
Times for perl -MMath::Prime::XS=primes -E 'say join ",",
primes(10**9,10**9+200)'
36.6s Math::Prime::XS 0.26
10.0s Math::Prime::XS with above code
4.9s Math::Prime::Util::erat_primes (non-segmented)
0.05s Math::Prime::Util::segment_primes (segmented)
Less dramatic, primes(10**8 , 10**8 + 10**7) | md5sum
2.7s Math::Prime::XS 0.26
0.9s Math::Prime::XS with above code
0.4s Math::Prime::Util::erat_primes (non-segmented)
0.2s Math::Prime::Util::segment_primes (segmented)