Skip Menu |

This queue is for tickets about the PathTools CPAN distribution.

Report information
The Basics
Id: 21796
Status: resolved
Priority: 0/
Queue: PathTools

People
Owner: Nobody in particular
Requestors: RUZ [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.19
Fixed in: (no value)



Subject: [PATCH] canonpath performance improvements
Hi. As the canonpath method is used in many methods of the distribution and many perl modules use PathTools so I tried to improve performance of the method. Changes: * got rid of 'isa' call As we have a class for the Cygwin platform, I've moved support for network paths to the cygwin's package. * changed regexp /^(?:qnx|nto)$/ to two 'eq' operations with OR aggregator. ** Also, after above changes we could get rid of the $double_slashes_special variable without loosing code readabilty * changed s|/+|/|g to s|/{2,}|/|g This is fastest variant I've found, but if you think that it could be slower on some systems you may use s|//+|/|g which as I think faster than the current in any perl version as it's very close to the curent RE, but avoids useless matches in '...x/x...'. * all groups "(...)" in regexps that are not used through $0-$9 have been replaced with (?:...) * the latest change is replace of \Z(?!\n) with \z. I don't understand why the former is used (may be some older perl versions have no support for \z), but the latter is already used in several places and sure faster than the former. -- Best regards, Ruslan.
Subject: PathTools-canonpath-performance.patch
diff -ru PathTools-3.19/lib/File/Spec/Cygwin.pm PathTools-3.19-my/lib/File/Spec/Cygwin.pm --- PathTools-3.19/lib/File/Spec/Cygwin.pm 2006-07-12 07:41:44.000000000 +0400 +++ PathTools-3.19-my/lib/File/Spec/Cygwin.pm 2006-10-01 08:02:33.000000000 +0400 @@ -40,7 +40,13 @@ sub canonpath { my($self,$path) = @_; $path =~ s|\\|/|g; - return $self->SUPER::canonpath($path); + + # Handle network path names beginning with double slash + my $node = ''; + if ( $path =~ s@^(//[^/]+)(?:/|\z)@/@s ) { + $node = $1; + } + return $node . $self->SUPER::canonpath($path); } sub catdir { diff -ru PathTools-3.19/lib/File/Spec/Unix.pm PathTools-3.19-my/lib/File/Spec/Unix.pm --- PathTools-3.19/lib/File/Spec/Unix.pm 2006-07-12 07:41:44.000000000 +0400 +++ PathTools-3.19-my/lib/File/Spec/Unix.pm 2006-10-01 08:02:33.000000000 +0400 @@ -43,13 +43,12 @@ my ($self,$path) = @_; # Handle POSIX-style node names beginning with double slash (qnx, nto) - # Handle network path names beginning with double slash (cygwin) # (POSIX says: "a pathname that begins with two successive slashes # may be interpreted in an implementation-defined manner, although # more than two leading slashes shall be treated as a single slash.") my $node = ''; - my $double_slashes_special = $self->isa("File::Spec::Cygwin") || $^O =~ m/^(?:qnx|nto)$/; + my $double_slashes_special = $^O eq 'qnx' || $^O eq 'nto'; - if ( $double_slashes_special && $path =~ s:^(//[^/]+)(/|\z):/:s ) { + if ( $double_slashes_special && $path =~ s@^(//[^/]+)(?:/|\z)@/@s ) { $node = $1; } # This used to be @@ -57,12 +56,12 @@ # but that made tests 29, 30, 35, 46, and 213 (as of #13272) to fail # (Mainly because trailing "" directories didn't get stripped). # Why would cygwin avoid collapsing multiple slashes into one? --jhi - $path =~ s|/+|/|g; # xx////xx -> xx/xx + $path =~ s|/{2,}|/|g; # xx////xx -> xx/xx - $path =~ s@(/\.)+(/|\Z(?!\n))@/@g; # xx/././xx -> xx/xx + $path =~ s@(?:/\.)+(?:/|\z)@/@g; # xx/././xx -> xx/xx - $path =~ s|^(\./)+||s unless $path eq "./"; # ./xx -> xx + $path =~ s|^(?:\./)+||s unless $path eq "./"; # ./xx -> xx - $path =~ s|^/(\.\./)+|/|; # /../../xx -> xx + $path =~ s|^/(?:\.\./)+|/|; # /../../xx -> xx $path =~ s|^/\.\.$|/|; # /.. -> / - $path =~ s|/\Z(?!\n)|| unless $path eq "/"; # xx/ -> xx + $path =~ s|/\z|| unless $path eq "/"; # xx/ -> xx return "$node$path"; }
\z is available since perl 5.005, but is not available in 5.004. If the latter version is not supported then it qould be quite useful to replace all \Z(?!\n) with \z.
Hello, Ken. I've writen benchmark to prove that changes make sense. Script is attached. -- Best regards, Ruslan.
#!/usr/bin/perl -w use strict; use warnings; use Benchmark qw(cmpthese); my $time = -5; my @a = ( qw( ///../../..//./././a//b/.././c/././ a/../../b/c /. /./ /a/./ /a/. /../../ /../.. ), ''); cmpthese( $time, { std => sub { return map Current->canonpath($_), @a }, new => sub { return map My->canonpath($_), @a }, }); exit 0; package Current; sub canonpath { my ($self,$path) = @_; # Handle POSIX-style node names beginning with double slash (qnx, nto) # Handle network path names beginning with double slash (cygwin) # (POSIX says: "a pathname that begins with two successive slashes # may be interpreted in an implementation-defined manner, although # more than two leading slashes shall be treated as a single slash.") my $node = ''; my $double_slashes_special = $self->isa("File::Spec::Cygwin") || $^O =~ m/^(?:qnx|nto)$/; if ( $double_slashes_special && $path =~ s:^(//[^/]+)(/|\z):/:s ) { $node = $1; } # This used to be # $path =~ s|/+|/|g unless ($^O eq 'cygwin'); # but that made tests 29, 30, 35, 46, and 213 (as of #13272) to fail # (Mainly because trailing "" directories didn't get stripped). # Why would cygwin avoid collapsing multiple slashes into one? --jhi $path =~ s|/+|/|g; # xx////xx -> xx/xx $path =~ s@(/\.)+(/|\Z(?!\n))@/@g; # xx/././xx -> xx/xx $path =~ s|^(\./)+||s unless $path eq "./"; # ./xx -> xx $path =~ s|^/(\.\./)+|/|; # /../../xx -> xx $path =~ s|^/\.\.$|/|; # /.. -> / $path =~ s|/\Z(?!\n)|| unless $path eq "/"; # xx/ -> xx return "$node$path"; } 1; package My; sub canonpath { my ($self,$path) = @_; # Handle POSIX-style node names beginning with double slash (qnx, nto) # (POSIX says: "a pathname that begins with two successive slashes # may be interpreted in an implementation-defined manner, although # more than two leading slashes shall be treated as a single slash.") my $node = ''; my $double_slashes_special = $^O eq 'qnx' || $^O eq 'nto'; if ( $double_slashes_special && $path =~ s@^(//[^/]+)(?:/|\z)@/@s ) { $node = $1; } # This used to be # $path =~ s|/+|/|g unless ($^O eq 'cygwin'); # but that made tests 29, 30, 35, 46, and 213 (as of #13272) to fail # (Mainly because trailing "" directories didn't get stripped). # Why would cygwin avoid collapsing multiple slashes into one? --jhi $path =~ s|/{2,}|/|g; # xx////xx -> xx/xx $path =~ s@(?:/\.)+(?:/|\z)@/@g; # xx/././xx -> xx/xx $path =~ s|^(?:\./)+||s unless $path eq "./"; # ./xx -> xx $path =~ s|^/(?:\.\./)+|/|; # /../../xx -> xx $path = '/' if $path eq '/..'; # /.. -> / $path =~ s|/\z|| unless $path eq "/"; # xx/ -> xx return "$node$path"; } 1;
Hi Ruslan, Looks great, thanks for the patch. There was already one instance of \z in the code - I'm not sure whether we really target 5.004 or not, but we're already failing, so I took your advice and changed the rest of the \Z's to \z's too. =) -Ken