Bug #85536 for Math-Prime-TiedArray: Efficiency (with patches)

Subject:

Efficiency (with patches)

Math::Prime::TiedArray is very slow and uses huge quantities of memory for anything beyond tiny inputs. Attached are two suggested changes. Sum the first 1M primes: /usr/bin/time perl -MMath::Prime::TiedArray -E 'tie my @primes, "Math::Prime::TiedArray"; $sum += $primes[$_] for 1..1_000_000; say $sum;' Takes 83 seconds and over 1GB of memory with v0.04. With the patches it takes 5.2 seconds and 180MB. That's still far more than a module like this needs, but it's a help. The space is all going into holding onto all the values in a big array, and changing that would mean fundamentally changing the internals. The main change is replacing _atkin with a Sieve of Erostosthenes. It's not only smaller, but it's over 10x faster. The Sieve of Atkin may win on paper, but in practice it doesn't (even if the SoA is written correctly). I called it _soe, and both calls to _atkin should be changed. This is a pretty simple sieve -- I have a segmented version that would be much nicer for this application. It's another 50 or so lines and for best use would store the base sieve string. At that point looking into using a window around their request rather than storing the entire range might be nice. The second change is a much smaller effect, and just tries to give EXTEND better behavior. It could be tuned as desired (the multiplier can be 1.0, but I think it works better with something higher).

Subject:

EXTEND.pm

# need to calculate more primes sub EXTEND { my ( $self, $n ) = @_; warn "EXTEND(@_)\n" if $self->{_options}{debug} > 2; return if $n <= $self->{_cache_size}; # Find an upper bound that we know contains the nth prime my $logn = log($n); my $log2n = log($logn); my $nth_estimate = # Dusart 2010 for 179k and 688k, custom lower ($n >= 688383) ? int(0.5+$n*($logn+$log2n-1.0+(($log2n-2.00)/$logn))) : ($n >= 178974) ? int(0.5+$n*($logn+$log2n-1.0+(($log2n-1.95)/$logn))) : ($n >= 18) ? int(0.5+$n*($logn+$log2n-1.0+(($log2n+0.30)/$logn))) : 59; $nth_estimate = int(1.5 * $nth_estimate) + $self->{_options}{extend_step}; $self->_soe($nth_estimate); }

Subject:

_soe.pm

sub _soe { my ( $self, $limit ) = @_; warn "DEBUG2: Calculating primes up to $limit\n" if $self->{_options}{debug} > 1; return if $limit <= $self->{_limit}; croak "Cannot extend beyond $self->{_options}{extend_ceiling}!" if $self->{_options}{extend_ceiling} and $limit > $self->{_options}{extend_ceiling}; croak "Invalid limit" if $limit <= 2; # We should use a segmented sieve that fills in just the new section. $limit-- if ($limit & 1) == 0; my $s_end = $limit >> 1; my $whole = int($s_end / 15); # prefill with 3 and 5 marked my $sieve = '100010010010110' . '011010010010110' x $whole; substr($sieve, $s_end+1) = ''; my ($n, $slimit) = ( 7, int(sqrt($limit)) ); while ( $n <= $slimit ) { for (my $s = ($n*$n) >> 1; $s <= $s_end; $s += $n) { substr($sieve, $s, 1) = '1'; } do { $n += 2 } while substr($sieve, $n>>1, 1); } # If we wanted the count: my $count = 1 + $sieve =~ tr/0//; $n = 7-2; my $cachep = $self->{_cache}; my $last_cp = $cachep->[-1]; push @$cachep, 3 if $last_cp < 3; push @$cachep, 5 if $last_cp < 5; foreach my $s (split("0", substr($sieve, 3), -1)) { $n += 2 + 2 * length($s); push @$cachep, $n if $n > $last_cp && $n <= $limit; } $self->{_max_prime} = $self->{_cache}[-1]; $self->{_cache}[0] = $self->{_limit} = $limit; $self->{_cache_size} = $#{ $self->{_cache} }; }