Subject: | [PATCH] canonpath performance improvements |
Hi.
As the canonpath method is used in many methods of the distribution and
many perl modules use PathTools so I tried to improve performance of the
method.
Changes:
* got rid of 'isa' call
As we have a class for the Cygwin platform, I've moved support for
network paths to the cygwin's package.
* changed regexp /^(?:qnx|nto)$/ to two 'eq' operations with OR aggregator.
** Also, after above changes we could get rid of the
$double_slashes_special variable without loosing code readabilty
* changed s|/+|/|g to s|/{2,}|/|g
This is fastest variant I've found, but if you think that it could be
slower on some systems you may use s|//+|/|g which as I think faster
than the current in any perl version as it's very close to the curent
RE, but avoids useless matches in '...x/x...'.
* all groups "(...)" in regexps that are not used through $0-$9 have
been replaced with (?:...)
* the latest change is replace of \Z(?!\n) with \z. I don't understand
why the former is used (may be some older perl versions have no support
for \z), but the latter is already used in several places and sure
faster than the former.
--
Best regards, Ruslan.
Subject: | PathTools-canonpath-performance.patch |
diff -ru PathTools-3.19/lib/File/Spec/Cygwin.pm PathTools-3.19-my/lib/File/Spec/Cygwin.pm
--- PathTools-3.19/lib/File/Spec/Cygwin.pm 2006-07-12 07:41:44.000000000 +0400
+++ PathTools-3.19-my/lib/File/Spec/Cygwin.pm 2006-10-01 08:02:33.000000000 +0400
@@ -40,7 +40,13 @@
sub canonpath {
my($self,$path) = @_;
$path =~ s|\\|/|g;
- return $self->SUPER::canonpath($path);
+
+ # Handle network path names beginning with double slash
+ my $node = '';
+ if ( $path =~ s@^(//[^/]+)(?:/|\z)@/@s ) {
+ $node = $1;
+ }
+ return $node . $self->SUPER::canonpath($path);
}
sub catdir {
diff -ru PathTools-3.19/lib/File/Spec/Unix.pm PathTools-3.19-my/lib/File/Spec/Unix.pm
--- PathTools-3.19/lib/File/Spec/Unix.pm 2006-07-12 07:41:44.000000000 +0400
+++ PathTools-3.19-my/lib/File/Spec/Unix.pm 2006-10-01 08:02:33.000000000 +0400
@@ -43,13 +43,12 @@
my ($self,$path) = @_;
# Handle POSIX-style node names beginning with double slash (qnx, nto)
- # Handle network path names beginning with double slash (cygwin)
# (POSIX says: "a pathname that begins with two successive slashes
# may be interpreted in an implementation-defined manner, although
# more than two leading slashes shall be treated as a single slash.")
my $node = '';
- my $double_slashes_special = $self->isa("File::Spec::Cygwin") || $^O =~ m/^(?:qnx|nto)$/;
+ my $double_slashes_special = $^O eq 'qnx' || $^O eq 'nto';
- if ( $double_slashes_special && $path =~ s:^(//[^/]+)(/|\z):/:s ) {
+ if ( $double_slashes_special && $path =~ s@^(//[^/]+)(?:/|\z)@/@s ) {
$node = $1;
}
# This used to be
@@ -57,12 +56,12 @@
# but that made tests 29, 30, 35, 46, and 213 (as of #13272) to fail
# (Mainly because trailing "" directories didn't get stripped).
# Why would cygwin avoid collapsing multiple slashes into one? --jhi
- $path =~ s|/+|/|g; # xx////xx -> xx/xx
+ $path =~ s|/{2,}|/|g; # xx////xx -> xx/xx
- $path =~ s@(/\.)+(/|\Z(?!\n))@/@g; # xx/././xx -> xx/xx
+ $path =~ s@(?:/\.)+(?:/|\z)@/@g; # xx/././xx -> xx/xx
- $path =~ s|^(\./)+||s unless $path eq "./"; # ./xx -> xx
+ $path =~ s|^(?:\./)+||s unless $path eq "./"; # ./xx -> xx
- $path =~ s|^/(\.\./)+|/|; # /../../xx -> xx
+ $path =~ s|^/(?:\.\./)+|/|; # /../../xx -> xx
$path =~ s|^/\.\.$|/|; # /.. -> /
- $path =~ s|/\Z(?!\n)|| unless $path eq "/"; # xx/ -> xx
+ $path =~ s|/\z|| unless $path eq "/"; # xx/ -> xx
return "$node$path";
}