Subject: | The primes function is *very* slow and uses too much memory (patch included) |
The primes function is far slower than it should be. The attached patch
makes it run much faster and use much less memory, while also fixing
incorrect prime count returns for N <= 2, and primes for N <= 1.
I realize that this no longer really tests the functionality of
Math::BigInt, and hence may not be acceptable as submitted. I have it
returning all results as BigInts at the end, so the result is the same
as it was before, even though there is really no reason for them to be
BigInts (and the conversion takes more time than the sieving for most
inputs). But for almost all cases, this is really not necessary. The
case I can think of would be for a 32-bit machine with an argument
larger than 2**32. The old code used more than 26GB of RAM getting to
800M, so I don't think anyone ever hit that limit. The new code would
be able to get there, so maybe it deserves more thought.
For BigInts, a segmented sieve might also be worthwhile, though the
current interface (a single integer with an implied lower limit of 2)
doesn't really make it useful.
The patch uses a string sieve which is faster than array versions. It
would be an easy change to instead use a more conventional vec() version
that runs 2-4x slower than this patch but uses 1/8th the sieve memory.
The memory savings when run in array context is dwarfed by the BigInt
return values, so not really a factor there. Either one would run quite
a bit faster than the existing code.
time perl -MMath::Big=primes -E 'say scalar primes(10_000_000);'
664579
377.58s 1,510,336k used
time perl -Iblib/lib -Iblib/arch -MMath::Big=primes -E 'say scalar
primes(10_000_000);'
664579
0.81s 98,176k used
time perl -MMath::Big=primes -E 'my @p = primes(700000); say $p[50000];'
611957
26.21s 244,528k used
time perl -Iblib/lib -Iblib/arch -MMath::Big=primes -E 'my @p =
primes(700000); say $p[50000];'
0.58s 150,096k used
Memory use is from /usr/bin/time, so includes everything. I verified
the prime count for small values using:
perl -Iblib/lib -Iblib/arch -MMath::Big=primes -MMath::Prime::FastSieve
-E 'my $sieve = Math::Prime::FastSieve::Sieve->new( 1_000_000 ); foreach
my $n (1..100000) { die "$n" unless $sieve->count_le($n) == scalar
primes($n); say "good through $n" if $n % 5000 == 0; }'
and the primes array using:
perl -Iblib/lib -Iblib/arch -MMath::Big=primes -MMath::Prime::FastSieve
-E 'my $sieve = Math::Prime::FastSieve::Sieve->new( 1_000_000 ); foreach
my $n (1..10000) { my @p1 = @{$sieve->primes($n)}; my @p2 = primes($n);
die "$n" unless scalar @p1 == scalar @p2; foreach my $i (0..$#p1) { die
"$n" unless $p1[$i] == $p2[$i]; } say "good through $n" if $n % 500 == 0; }'
Subject: | Math-Big.patch |
--- lib/site_perl/5.17.6/Math/Big.pm 2007-04-16 08:27:54.000000000 -0700
+++ lib/Math/Big.pm 2012-12-14 13:34:05.429974839 -0800
@@ -29,64 +29,41 @@
my $five = Math::BigFloat->new(5); # for pi
my $twothreenine = Math::BigFloat->new(239); # for pi
-sub primes
- {
- my $amount = shift; $amount = 1000 if !defined $amount;
- $amount = Math::BigInt->new($amount) unless ref $amount;
-
- return (Math::BigInt->new(2)) if $amount < $three;
-
- $amount++;
-
- # any not defined number is prime, 0,1 are not, but 2 is
- my @primes = (1,1,0);
- my $prime = $three->copy(); # start
- my $r = 0; my $a = $amount->numify();
- for (my $i = 3; $i < $a; $i++) # int version
- {
- $primes[$i] = $r; $r = 1-$r;
- }
- my ($cur,$add);
- # find primes
- OUTER:
- while ($prime <= $amount)
- {
- # find first unmarked, it is the next prime
- $cur = $prime;
- while ($primes[$cur])
- {
- $cur += $two; last OUTER if $cur >= $amount; # no more to do
- }
- # $cur is now new prime
- # now strike out all multiples of $cur
- $add = $cur * $two;
- $prime = $cur + $two; # next round start two higher
- $cur += $add;
- while ($cur <= $amount)
- {
- $primes[$cur] = 1; $cur += $add;
- }
- }
-
- if (!wantarray)
- {
- my $n = 0;
- for my $p (@primes)
- {
- $n++ if $p == 0;
- }
- return Math::BigInt->new($n);
- }
-
- my @real_primes; my $i = 0;
- while ($i < scalar @primes)
- {
- push @real_primes, Math::BigInt->new($i) if $primes[$i] == 0;
- $i ++;
- }
-
- @real_primes;
+# In scalar context this returns the prime count (# of primes <= N).
+# In array context it returns a list of primes from 2 to N.
+sub primes {
+ my $end = shift;
+ return unless defined $end;
+ $end = $end->numify() if ref($end) =~ /^Math::Big/;
+ if ($end < 2) { return !wantarray ? Math::BigInt->bzero() : (); }
+ if ($end < 3) { return !wantarray ? $one->copy : ($two->copy); }
+ if ($end < 5) { return !wantarray ? $two->copy : ($two->copy, $three->copy); }
+
+ $end-- unless ($end & 1);
+ my $s_end = $end >> 1;
+ my $whole = int( ($end>>1) / 15);
+ # Be conservative. This would result in terabytes of array output.
+ die "Cannot return $end primes!" if $whole > 1_145_324_612; # ~32 GB string
+ my $sieve = "100010010010110" . "011010010010110" x $whole;
+ substr($sieve, $s_end+1) = ''; # Clip to the right number of entries
+ my ($n, $limit) = ( 7, int(sqrt($end)) );
+ while ( $n <= $limit ) {
+ for (my $s = ($n*$n) >> 1; $s <= $s_end; $s += $n) {
+ substr($sieve, $s, 1) = '1';
+ }
+ do { $n += 2 } while substr($sieve, $n>>1, 1);
+ }
+
+ return Math::BigInt->new(1 + $sieve =~ tr/0//) if !wantarray;
+
+ my @primes = (2, 3, 5);
+ $n = 7-2;
+ foreach my $s (split("0", substr($sieve, 3), -1)) {
+ $n += 2 + 2 * length($s);
+ push @primes, $n if $n <= $end;
}
+ return map { Math::BigInt->new($_) } @primes;
+}
sub fibonacci
{
@@ -880,7 +857,7 @@
cos sin tan euler bernoulli arctan arcsin pi/;
@primes = primes(100); # first 100 primes
- $prime = primes(100); # 100th prime
+ $count = primes(100); # number of primes <= 100
@fib = fibonacci (100); # first 100 fibonacci numbers
$fib_1000 = fibonacci (1000); # 1000th fibonacci number
$hailstone = hailstone (1000); # length of sequence
@@ -930,11 +907,10 @@
$primes = primes($n);
Calculates all the primes below N and returns them as array. In scalar context
-returns the number of primes below N.
+returns the prime count of N (the number of primes less than or equal to N).
This uses an optimized version of the B<Sieve of Eratosthenes>, which takes
-half of the time and half of the space, but is still O(N). Or in other words,
-quite slow.
+half of the time and half of the space, but is still O(N).
=head2 B<fibonacci()>