Skip Menu |

This queue is for tickets about the Archive-Tar CPAN distribution.

Report information
The Basics
Id: 20399
Status: resolved
Priority: 0/
Queue: Archive-Tar

People
Owner: Nobody in particular
Requestors: MSCHILLI [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.29
Fixed in: (no value)



Subject: Reduce calls to cwd() even further
We found a performance problem which occurs when you extract a tarball with a lot of (relative) file entries. Looks like there's already a note in the release notes of eliminating excessive cwd() calls, but I think it can be improved even further (Patch attached): C<Archive::Tar> needs to know the current directory, and it will run C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the tarfile and saves it in the file system. C<Cwd::cwd()> is pretty expensive: It forks a new process and runs the external C<pwd> command. In a tarball with tens of thousands of relative entries, this can be a huge performance hit. Since C<Archive::Tar> doesn't change the current directory internally while it is extracting the items in a tarball, all calls to C<Cwd::cwd()> can be avoided if we can guarantee that the current directory doesn't get changed externally. To use this performance boost, set the current directory via use Cwd; $tar->setcwd( cwd() ); once before calling a function like C<extract_file> and C<Archive::Tar> will use the current directory setting from then on and won't call C<Cwd::cwd()> internally. To switch back to the default behaviour, use $tar->setcwd( undef ); and C<Archive::Tar> will call C<Cwd::cwd()> internally again. If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will be called for you. Would be great if you could apply the patch for the next release ... thanks for taking care of Archive::Tar! -- Mike Schilli
Subject: patch.txt
diff -Naur Archive-Tar-1.29/lib/Archive/Tar.pm Archive-Tar-1.29.patched/lib/Archive/Tar.pm --- Archive-Tar-1.29/lib/Archive/Tar.pm Fri Mar 3 05:47:01 2006 +++ Archive-Tar-1.29.patched/lib/Archive/Tar.pm Fri Jul 7 16:49:51 2006 @@ -435,6 +435,9 @@ my @args = @_; my @files; + # use the speed optimization for all extracted files + local($self->{cwd}) = cwd() unless $self->{cwd}; + ### you requested the extraction of only certian files if( @args ) { for my $file ( @args ) { @@ -538,7 +541,7 @@ ### it's a relative path ### } else { - my $cwd = cwd(); + my $cwd = (defined $self->{cwd} ? $self->{cwd} : cwd()); my @dirs = File::Spec::Unix->splitdir( $dirs ); my @cwd = File::Spec->splitdir( $cwd ); $dir = File::Spec->catdir( @cwd, @dirs ); @@ -1389,6 +1392,50 @@ sub no_string_support { croak("You have to install IO::String to support writing archives to strings"); +} + +=head2 $tar->setcwd( $cwd ); + +C<Archive::Tar> needs to know the current directory, and it will run +C<Cwd::cwd()> I<every> time it extracts a I<relative> entry from the +tarfile and saves it in the file system. (As of version 1.30, however, +C<Archive::Tar> will use the speed optimization described below +automatically, so it's only relevant if you're using C<extract_file()>). + +C<Cwd::cwd()> is pretty expensive: It forks a new process and runs +the external C<pwd> command. In a tarball with tens of thousands of +relative entries, this can be a huge performance hit. + +Since C<Archive::Tar> doesn't change the current directory internally +while it is extracting the items in a tarball, all calls to C<Cwd::cwd()> +can be avoided if we can guarantee that the current directory doesn't +get changed externally. + +To use this performance boost, set the current directory via + + use Cwd; + $tar->setcwd( cwd() ); + +once before calling a function like C<extract_file> and +C<Archive::Tar> will use the current directory setting from then on +and won't call C<Cwd::cwd()> internally. + +To switch back to the default behaviour, use + + $tar->setcwd( undef ); + +and C<Archive::Tar> will call C<Cwd::cwd()> internally again. + +If you're using C<Archive::Tar>'s C<exract()> method, C<setcwd()> will +be called for you. + +=cut + +sub setcwd { + my $self = shift; + my $cwd = shift; + + $self->{cwd} = $cwd; } 1;
On Mon Jul 10 16:26:24 2006, MSCHILLI wrote: Show quoted text
> We found a performance problem which occurs when you extract a tarball > with a lot of (relative) file entries. Looks like there's > already a note in the release notes of eliminating excessive cwd() > calls, but I think it can be improved even further (Patch attached):
Thanks for the patch, it's applied with minor tweaks as change 12588. I'll include it in the next release of A::T -- Jos
CC: MSCHILLI [...] cpan.org
Subject: Re: [rt.cpan.org #20399] Reduce calls to cwd() even further
Date: Tue, 1 Aug 2006 18:28:18 -0700 (PDT)
To: via RT <bug-Archive-Tar [...] rt.cpan.org>
From: Mike Schilli <m [...] perlmeister.com>
On Tue, 1 Aug 2006, via RT wrote: Show quoted text
> Thanks for the patch, it's applied with minor tweaks as change 12588. > > I'll include it in the next release of A::T
Wonderful, thanks! -- Mike Mike Schilli m@perlmeister.com