Skip Menu |

This queue is for tickets about the Algorithm-Diff CPAN distribution.

Report information
The Basics
Id: 101105
Status: open
Priority: 0/
Queue: Algorithm-Diff

People
Owner: Nobody in particular
Requestors: XENU [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: [PATCH] 2x faster LCS
Hello, I have optimized LCS function, so now it runs 96% faster when $keyGen is undef, the only change is removal of unnecessary calls of keygen functions. Looks like subroutine calls are really expensive in Perl. Unfortunately my patch has one downside, when $keyGen is actually defined, LCS is 4% slower than it was without my changes. I don't think it's a problem, because typically LCS is used without $keyGen. Patch is attached. Cheers, Tomasz Test script: use strict; use warnings; use Algorithm::Diff; use Algorithm::OldDiff; use Benchmark qw/:all/; use Digest::MD5 qw/md5_hex/; my @aa = (('z') x 100, 'a', 'b', 'c', ('dupa')x70000, 'd', 'kupa'); my @bb = (('z') x 100, 'a', 'b', 'c', ('dupz')x70000, 'f', 'kupa'); my $hash = sub { md5_hex(shift) }; cmpthese(50, { 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb) }, 'orginal' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb) }, }); cmpthese(50, { 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb, $hash) }, 'original' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb, $hash) }, }); Results: PS E:\Dokumenty\difftest> perl .\lcs.pl Rate orginal my patch orginal 2.41/s -- -49% my patch 4.72/s 96% -- Rate my patch original my patch 1.39/s -- -3% original 1.44/s 4% --
Subject: faster_lcs.patch
ÿþ--- Diff.pm.old 2014-11-26 06:41:54.000000000 +0100 +++ Diff.pm 2014-12-23 21:03:51.000000000 +0100 @@ -42,7 +42,7 @@ for ( $index = $start ; $index <= $end ; $index++ ) { my $element = $aCollection->[$index]; - my $key = &$keyGen( $element, @_ ); + my $key = $keyGen ? &$keyGen( $element, @_ ) : $element; if ( exists( $d{$key} ) ) { unshift ( @{ $d{$key} }, $index ); @@ -147,12 +147,7 @@ # set up code refs # Note that these are optimized. - if ( !defined($keyGen) ) # optimize for strings - { - $keyGen = sub { $_[0] }; - $compare = sub { my ( $a, $b ) = @_; $a eq $b }; - } - else + if ( $keyGen ) # optimize for strings { $compare = sub { my $a = shift; @@ -175,7 +170,8 @@ # First we prune off any common elements at the beginning while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aStart], $b->[$bStart], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aStart], $b->[$bStart], @_ ) + : ( $a->[$aStart] eq $b->[$bStart] ) ) ) { $matchVector->[ $aStart++ ] = $bStart++; $prunedCount++; @@ -184,7 +180,8 @@ # now the end while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) + : ( $a->[$aFinish] eq $b->[$bFinish] ) ) ) { $matchVector->[ $aFinish-- ] = $bFinish--; $prunedCount++; @@ -200,7 +197,7 @@ my ( $i, $ai, $j, $k ); for ( $i = $aStart ; $i <= $aFinish ; $i++ ) { - $ai = &$keyGen( $a->[$i], @_ ); + $ai = $keyGen ? &$keyGen( $a->[$i], @_ ) : $a->[$i]; if ( exists( $bMatches->{$ai} ) ) { $k = 0;
Oops, just noticed that patch file contains BOM, sorry for that, I'm attaching corrected version.
Subject: faster_lcs.patch
--- Diff.pm.old 2014-11-26 06:41:54.000000000 +0100 +++ Diff.pm 2014-12-23 21:03:51.000000000 +0100 @@ -42,7 +42,7 @@ for ( $index = $start ; $index <= $end ; $index++ ) { my $element = $aCollection->[$index]; - my $key = &$keyGen( $element, @_ ); + my $key = $keyGen ? &$keyGen( $element, @_ ) : $element; if ( exists( $d{$key} ) ) { unshift ( @{ $d{$key} }, $index ); @@ -147,12 +147,7 @@ # set up code refs # Note that these are optimized. - if ( !defined($keyGen) ) # optimize for strings - { - $keyGen = sub { $_[0] }; - $compare = sub { my ( $a, $b ) = @_; $a eq $b }; - } - else + if ( $keyGen ) # optimize for strings { $compare = sub { my $a = shift; @@ -175,7 +170,8 @@ # First we prune off any common elements at the beginning while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aStart], $b->[$bStart], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aStart], $b->[$bStart], @_ ) + : ( $a->[$aStart] eq $b->[$bStart] ) ) ) { $matchVector->[ $aStart++ ] = $bStart++; $prunedCount++; @@ -184,7 +180,8 @@ # now the end while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) + : ( $a->[$aFinish] eq $b->[$bFinish] ) ) ) { $matchVector->[ $aFinish-- ] = $bFinish--; $prunedCount++; @@ -200,7 +197,7 @@ my ( $i, $ai, $j, $k ); for ( $i = $aStart ; $i <= $aFinish ; $i++ ) { - $ai = &$keyGen( $a->[$i], @_ ); + $ai = $keyGen ? &$keyGen( $a->[$i], @_ ) : $a->[$i]; if ( exists( $bMatches->{$ai} ) ) { $k = 0;
Subject: Re: [rt.cpan.org #101105] [PATCH] 2x faster LCS
Date: Thu, 25 Dec 2014 20:03:28 -0800
To: bug-Algorithm-Diff [...] rt.cpan.org
From: Tye McQueen <tyemq [...] cpan.org>
Thanks! I'll apply the patch and let you know. Tye On Tue, Dec 23, 2014 at 12:41 PM, Tomasz Konojacki via RT < bug-Algorithm-Diff@rt.cpan.org> wrote: Show quoted text
> Tue Dec 23 15:41:34 2014: Request 101105 was acted upon. > Transaction: Ticket created by XENU > Queue: Algorithm-Diff > Subject: [PATCH] 2x faster LCS > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: XENU@cpan.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101105 > > > > Hello, > > I have optimized LCS function, so now it runs 96% faster when $keyGen is > undef, the only change is removal of unnecessary calls of keygen functions. > Looks like subroutine calls are really expensive in Perl. > > Unfortunately my patch has one downside, when $keyGen is actually defined, > LCS is 4% slower than it was without my changes. I don't think it's a > problem, because typically LCS is used without $keyGen. > > Patch is attached. > > Cheers, > Tomasz > > Test script: > > use strict; > use warnings; > > use Algorithm::Diff; > use Algorithm::OldDiff; > use Benchmark qw/:all/; > use Digest::MD5 qw/md5_hex/; > > my @aa = (('z') x 100, 'a', 'b', 'c', ('dupa')x70000, 'd', 'kupa'); > my @bb = (('z') x 100, 'a', 'b', 'c', ('dupz')x70000, 'f', 'kupa'); > > my $hash = sub { md5_hex(shift) }; > > cmpthese(50, { > 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb) }, > 'orginal' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb) }, > }); > > cmpthese(50, { > 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb, $hash) }, > 'original' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb, $hash) }, > }); > > Results: > > PS E:\Dokumenty\difftest> perl .\lcs.pl > Rate orginal my patch > orginal 2.41/s -- -49% > my patch 4.72/s 96% -- > Rate my patch original > my patch 1.39/s -- -3% > original 1.44/s 4% -- >
On Czw 25 Gru 2014, 23:03:39, TYEMQ wrote: Show quoted text
> Thanks! > > I'll apply the patch and let you know. > > Tye
Any news? Tomasz