Skip Menu |

This queue is for tickets about the Archive-Tar CPAN distribution.

Report information
The Basics
Id: 17395
Status: resolved
Priority: 0/
Queue: Archive-Tar

People
Owner: Nobody in particular
Requestors: bronto [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.28
Fixed in: (no value)



Subject: [PATCH] allow extract() to select files by regexes or substrings
My name is Marco Marongiu (CPAN ID: BRONTO). I am an Archive::Tar user. I am using Archive::Tar for an application I wrote for my employer. I needed it to extract some files that I don't know the full name of, but I know the extension. Unfortunately, extract() can't handle patterns. I could use a workaround using list_files() or such, but I thought that would have been quite inefficient. Therefore, I decided to write a patch against Archive::Tar 1.28 for the extract() method to support pattern matching and substring search, too. The patch is attached. It passes the make test the same way as 1.28. I also added the POD documentation for it. I hope you find it useful and want to include it in Archive::Tar.
Subject: Archive-Tar-1.28.patch.bronto.2.patch
*** lib/Archive/Tar.pm.dist 2006-01-30 15:09:46.000000000 +0100 --- lib/Archive/Tar.pm 2006-01-30 16:58:43.000000000 +0100 *************** *** 14,24 **** $DEBUG = 0; $WARN = 1; $FOLLOW_SYMLINK = 0; ! $VERSION = "1.28"; $CHOWN = 1; $CHMOD = 1; $DO_NOT_USE_PREFIX = 0; BEGIN { use Config; $HAS_PERLIO = $Config::Config{useperlio}; --- 14,26 ---- $DEBUG = 0; $WARN = 1; $FOLLOW_SYMLINK = 0; ! $VERSION = "1.28.patch.bronto.2"; $CHOWN = 1; $CHMOD = 1; $DO_NOT_USE_PREFIX = 0; + my @_ALLOWED_MATCHES = qw(exact pattern substring) ; + BEGIN { use Config; $HAS_PERLIO = $Config::Config{useperlio}; *************** *** 400,406 **** return; } ! =head2 $tar->extract( [@filenames] ) Write files whose names are equivalent to any of the names in C<@filenames> to disk, creating subdirectories as necessary. This --- 402,408 ---- return; } ! =head2 $tar->extract( [@filenames[,{type => matching_type}]] ) Write files whose names are equivalent to any of the names in C<@filenames> to disk, creating subdirectories as necessary. This *************** *** 414,419 **** --- 416,454 ---- If C<extract> is called without a list of file names, the entire contents of the archive are extracted. + You can pass the method an hash of options. By now, only the 'type' options + is defined and it affects the way C<extract> matches C<@filenames> against + the file names in the archive. + + =over 4 + + =item exact + + Only files that match exactly file names in C<@filenames> are extracted. + This is the default (i.e.: is what happens if you don't pass + C<extract()> any option. + + =item pattern + + This extract only files that match the patterns given in C<@filenames>. + You better pass your patterns through C<qr> before handing them to C<extract> + for performance reasons. + + + Example: + + my @list = $tar->extract(qw('.*\.$dat$'),{ type => 'pattern' }) ; + + extracts only filenames ending in '.dat'. + + =item substring + + This extracts only files whose name matches the strings given in C<@filenames>. + + =back + + Any unknown C<type> is forced to C<exact>. + Returns a list of filenames extracted. =cut *************** *** 421,433 **** sub extract { my $self = shift; my @files; ### you requested the extraction of only certian files if( @_ ) { for my $file (@_) { my $found; for my $entry ( @{$self->_data} ) { ! next unless $file eq $entry->full_path; ### we found the file you're looking for push @files, $entry; --- 456,483 ---- sub extract { my $self = shift; my @files; + my $opts ; ### you requested the extraction of only certian files if( @_ ) { + my $type = $_ALLOWED_MATCHES[0]; # giv this variable a suitable default + if (ref $_[$#_]) { + # pop away the last element of @_ in case it is a reference + $opts = pop @_ ; + + # get a value for $type, to be tested later + $type = exists $opts->{type}? $opts->{type}: $_ALLOWED_MATCHES[0] ; + } + + # Reset $type's value if someone tried to give it an invalid value + $type = $_ALLOWED_MATCHES[0] unless grep { $type eq $_ } @_ALLOWED_MATCHES ; + for my $file (@_) { my $found; for my $entry ( @{$self->_data} ) { ! $type eq 'exact' and next unless $entry->full_path eq $file ; ! $type eq 'pattern' and next unless $entry->full_path =~ /$file/ ; ! $type eq 'substring' and next if index($entry->full_path,$file) == -1 ; ### we found the file you're looking for push @files, $entry;
Hi, first of all, thanks for your patch to Archive::Tar, it's much appreciated. However, I've decided to not apply the patch in this form, for the following reasons: * the type of filtering you can do is constricted by what is coded inside extract(), and therefor not generic; the next wishes will be to filter on mtime, a custom sub, etc. * the patch changes the way extract() works, with an optional hashref as option. although that's the only to indeed make it behave this way, the syntax convolutes I've instead made sure that both extract() and extract_file() know exactly how to deal with Archive::Tar::File objects, and added this FAQ: ==== //member/kane/archive-tar-new/lib/Archive/Tar.pm#98 - /Users/kane/sources/p4/ other/archive-tar-new/lib/Archive/Tar.pm ==== 1527a1528,1544 Show quoted text
> =item How do I extract only files that have property X from an archive? > > Sometimes, you might not wish to extract a complete archive, just > the files that are relevant to you, based on some criteria. > > You can do this by filtering a list of C<Archive::Tar::File> objects > based on your criteria. For example, to extract only files that have > the string C<foo> in their title, you would use: > > $tar->extract( > grep { $_->full_path =~ /foo/ } $tar->get_files > ); > > This way, you can filter on any attribute of the files in the archive. > Consult the C<Archive::Tar::File> documentation on how to use these > objects.
This should provide a complete generic way of filtering files from extraction, and and not much more of a cost than coding it inside extract(), with it's known limitations. This will be added to archive::tar 1.29, which will be released shortly. Thanks again,
Subject: Re: [rt.cpan.org #17395] [PATCH] allow extract() to select files by regexes or substrings
Date: Fri, 03 Mar 2006 17:18:38 +0100
To: bug-Archive-Tar [...] rt.cpan.org
From: Marco Marongiu <mmarongiu [...] tiscali.it>
Hello via RT ha scritto: Show quoted text
> Hi, > > first of all, thanks for your patch to Archive::Tar, it's much appreciated.
Thanks. It's easy to provide patch when the code is well written and clean. Show quoted text
> However, I've decided to not apply the patch in this form, for the following reasons: > > * the type of filtering you can do is constricted by what is coded inside extract(), > and therefor not generic; the next wishes will be to filter on mtime, a custom sub, > etc. > * the patch changes the way extract() works, with an optional hashref as option. > although that's the only to indeed make it behave this way, the syntax convolutes
Ok Show quoted text
> I've instead made sure that both extract() and extract_file() know exactly how to deal > with Archive::Tar::File objects, and added this FAQ:
Ok, fine! Show quoted text
> ==== //member/kane/archive-tar-new/lib/Archive/Tar.pm#98 - /Users/kane/sources/p4/ > other/archive-tar-new/lib/Archive/Tar.pm ==== > 1527a1528,1544 >
>>=item How do I extract only files that have property X from an archive? >> >>Sometimes, you might not wish to extract a complete archive, just >>the files that are relevant to you, based on some criteria. >> >>You can do this by filtering a list of C<Archive::Tar::File> objects >>based on your criteria. For example, to extract only files that have >>the string C<foo> in their title, you would use: >> >> $tar->extract( >> grep { $_->full_path =~ /foo/ } $tar->get_files >> ); >> >>This way, you can filter on any attribute of the files in the archive. >>Consult the C<Archive::Tar::File> documentation on how to use these >>objects.
Ok Show quoted text
> This should provide a complete generic way of filtering files from extraction, > and and not much more of a cost than coding it inside extract(), with it's known > limitations.
Well, actually I think that the computational cost of this method is much higher, since the full_path() methods gets called over and over. But it's ok (I hope :). I'll test that way. Show quoted text
> This will be added to archive::tar 1.29, which will be released shortly. > > Thanks again,
You're welcome! Ciao and thanks for working on Archive::Tar! --Marco -- Marco Marongiu Tiscali Services s.r.l. System Administrator S.S. 195, km 2,300 IT Systems Management Dept. Loc. "Sa Illetta" Phone: +39 070 460 1684 09122 Cagliari (CA) Fax: +39 070 460 9684 Sardegna - Italia