Subject: | Reduce calls to cwd() even further |
We found a performance problem which occurs when you extract a tarball
with a lot of (relative) file entries. Looks like there's
already a note in the release notes of eliminating excessive cwd()
calls, but I think it can be improved even further (Patch attached):
C<Archive::Tar> needs to know the current directory, and it will run
C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the
tarfile and saves it in the file system.
C<Cwd::cwd()> is pretty expensive: It forks a new process and runs
the external C<pwd> command. In a tarball with tens of thousands of
relative entries, this can be a huge performance hit.
Since C<Archive::Tar> doesn't change the current directory internally
while it is extracting the items in a tarball, all calls to C<Cwd::cwd()>
can be avoided if we can guarantee that the current directory doesn't
get changed externally.
To use this performance boost, set the current directory via
use Cwd;
$tar->setcwd( cwd() );
once before calling a function like C<extract_file> and
C<Archive::Tar> will use the current directory setting from then on
and won't call C<Cwd::cwd()> internally.
To switch back to the default behaviour, use
$tar->setcwd( undef );
and C<Archive::Tar> will call C<Cwd::cwd()> internally again.
If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will
be called for you.
Would be great if you could apply the patch for the next release ...
thanks for taking care of Archive::Tar!
-- Mike Schilli
Subject: | patch.txt |
diff -Naur Archive-Tar-1.29/lib/Archive/Tar.pm Archive-Tar-1.29.patched/lib/Archive/Tar.pm
--- Archive-Tar-1.29/lib/Archive/Tar.pm Fri Mar 3 05:47:01 2006
+++ Archive-Tar-1.29.patched/lib/Archive/Tar.pm Fri Jul 7 16:49:51 2006
@@ -435,6 +435,9 @@
my @args = @_;
my @files;
+ # use the speed optimization for all extracted files
+ local($self->{cwd}) = cwd() unless $self->{cwd};
+
### you requested the extraction of only certian files
if( @args ) {
for my $file ( @args ) {
@@ -538,7 +541,7 @@
### it's a relative path ###
} else {
- my $cwd = cwd();
+ my $cwd = (defined $self->{cwd} ? $self->{cwd} : cwd());
my @dirs = File::Spec::Unix->splitdir( $dirs );
my @cwd = File::Spec->splitdir( $cwd );
$dir = File::Spec->catdir( @cwd, @dirs );
@@ -1389,6 +1392,50 @@
sub no_string_support {
croak("You have to install IO::String to support writing archives to strings");
+}
+
+=head2 $tar->setcwd( $cwd );
+
+C<Archive::Tar> needs to know the current directory, and it will run
+C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the
+tarfile and saves it in the file system. (As of version 1.30, however,
+C<Archive::Tar> will use the speed optimization described below
+automatically, so it's only relevant if you're using C<extract_file()>).
+
+C<Cwd::cwd()> is pretty expensive: It forks a new process and runs
+the external C<pwd> command. In a tarball with tens of thousands of
+relative entries, this can be a huge performance hit.
+
+Since C<Archive::Tar> doesn't change the current directory internally
+while it is extracting the items in a tarball, all calls to C<Cwd::cwd()>
+can be avoided if we can guarantee that the current directory doesn't
+get changed externally.
+
+To use this performance boost, set the current directory via
+
+ use Cwd;
+ $tar->setcwd( cwd() );
+
+once before calling a function like C<extract_file> and
+C<Archive::Tar> will use the current directory setting from then on
+and won't call C<Cwd::cwd()> internally.
+
+To switch back to the default behaviour, use
+
+ $tar->setcwd( undef );
+
+and C<Archive::Tar> will call C<Cwd::cwd()> internally again.
+
+If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will
+be called for you.
+
+=cut
+
+sub setcwd {
+ my $self = shift;
+ my $cwd = shift;
+
+ $self->{cwd} = $cwd;
}
1;