Skip Menu |

This queue is for tickets about the Archive-Tar-Streamed CPAN distribution.

Report information
The Basics
Id: 64712
Status: open
Priority: 0/
Queue: Archive-Tar-Streamed

People
Owner: Nobody in particular
Requestors: framstag [...] rus.uni-stuttgart.de
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: archive is memory resident and not disk based
I want to create huge tar archives, bigger than system memory. The man-page says: NAME Archive::Tar::Streamed - Tar archives, non memory resident (...) Archive::Tar::Streamed provides a wrapper, which allows working with tar archives on disk, with no need for the archive to be memory resident. But my testprogram proves the opposite: framstag@diaspora:/tmp: ll Idiocracy.mpg -rw-r--r-- framstag users 838303984 2010-12-13 16:16:35 Idiocracy.mpg framstag@diaspora:/tmp: ./tar.pl tmp.tar Idiocracy.mpg Out of memory! framstag@diaspora:/tmp: cat tar.pl #!/usr/bin/perl -w use Archive::Tar; use Archive::Tar::Streamed; $usage = "$0 archive-name files...\n"; $name = shift or die $usage; @files = @ARGV or die $usage; open $tf,'>',$name or die $!; $tar = Archive::Tar::Streamed->new($tf); $tar->add(@files);
On Wed Jan 12 08:40:12 2011, FRAMSTAG wrote: Show quoted text
> I want to create huge tar archives, bigger than system memory. > The man-page says: > > NAME > > Archive::Tar::Streamed - Tar archives, non memory resident > (...) > Archive::Tar::Streamed provides a wrapper, which allows working with tar > archives on disk, with no need for the archive to be memory resident. > > > But my testprogram proves the opposite: > > framstag@diaspora:/tmp: ll Idiocracy.mpg > -rw-r--r-- framstag users 838303984 2010-12-13 16:16:35
Idiocracy.mpg Show quoted text
> > framstag@diaspora:/tmp: ./tar.pl tmp.tar Idiocracy.mpg > Out of memory! > > framstag@diaspora:/tmp: cat tar.pl > #!/usr/bin/perl -w > > use Archive::Tar; > use Archive::Tar::Streamed; > > $usage = "$0 archive-name files...\n"; > > $name = shift or die $usage; > @files = @ARGV or die $usage; > > open $tf,'>',$name or die $!; > $tar = Archive::Tar::Streamed->new($tf); > $tar->add(@files); >
If you read the source, it uses Archive::Tar's add_files for handling multiple files supplied to ->add. This means you need to break up files into separate calls to ->add to control how many files will be resident in memory at any one time. Try altering the test and trying again. I haven't tried it so maybe there are other issues, but I think this should avoid the one you mention.
Subject: Re: [rt.cpan.org #64712] archive is memory resident and not disk based
Date: Sun, 19 Aug 2012 21:51:47 +0200
To: Anthony J Lucas via RT <bug-Archive-Tar-Streamed [...] rt.cpan.org>
From: Ulli Horlacher <framstag [...] rus.uni-stuttgart.de>
On Mon 2012-07-30 (13:37), Anthony J Lucas via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=64712 > > > On Wed Jan 12 08:40:12 2011, FRAMSTAG wrote: >
> > I want to create huge tar archives, bigger than system memory. > > The man-page says: > > > > NAME > > > > Archive::Tar::Streamed - Tar archives, non memory resident > > (...) > > Archive::Tar::Streamed provides a wrapper, which allows working with tar > > archives on disk, with no need for the archive to be memory resident. > > > > > > But my testprogram proves the opposite: > > > > framstag@diaspora:/tmp: ll Idiocracy.mpg > > -rw-r--r-- framstag users 838303984 2010-12-13 16:16:35 Idiocracy.mpg > > > > framstag@diaspora:/tmp: ./tar.pl tmp.tar Idiocracy.mpg > > Out of memory! > > > > framstag@diaspora:/tmp: cat tar.pl > > #!/usr/bin/perl -w > > > > use Archive::Tar; > > use Archive::Tar::Streamed; > > > > $usage = "$0 archive-name files...\n"; > > > > $name = shift or die $usage; > > @files = @ARGV or die $usage; > > > > open $tf,'>',$name or die $!; > > $tar = Archive::Tar::Streamed->new($tf); > > $tar->add(@files); > >
> > If you read the source, it uses Archive::Tar's add_files for handling > multiple files supplied to ->add. > > This means you need to break up files into separate calls to ->add to > control how many files will be resident in memory at any one time.
In my example above I have only one file. How can I split it into separate calls to ->add? -- Ullrich Horlacher Informationssysteme und Serverbetrieb Rechenzentrum IZUS/TIK E-Mail: horlacher@rus.uni-stuttgart.de Universitaet Stuttgart Tel: ++49-711-68565868 Allmandring 30a Fax: ++49-711-682357 70550 Stuttgart (Germany) WWW: http://www.rus.uni-stuttgart.de/ REF: <rt-3.8.HEAD-17962-1343669834-212.64712-6-0@rt.cpan.org>
On Sun Aug 19 15:52:01 2012, framstag@rus.uni-stuttgart.de wrote: Show quoted text
> On Mon 2012-07-30 (13:37), Anthony J Lucas via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=64712 > > > > > On Wed Jan 12 08:40:12 2011, FRAMSTAG wrote: > >
> > > I want to create huge tar archives, bigger than system memory. > > > The man-page says: > > > > > > NAME > > > > > > Archive::Tar::Streamed - Tar archives, non memory resident > > > (...) > > > Archive::Tar::Streamed provides a wrapper, which allows working
> with tar
> > > archives on disk, with no need for the archive to be memory
> resident.
> > > > > > > > > But my testprogram proves the opposite: > > > > > > framstag@diaspora:/tmp: ll Idiocracy.mpg > > > -rw-r--r-- framstag users 838303984 2010-12-13 16:16:35
> Idiocracy.mpg
> > > > > > framstag@diaspora:/tmp: ./tar.pl tmp.tar Idiocracy.mpg > > > Out of memory! > > > > > > framstag@diaspora:/tmp: cat tar.pl > > > #!/usr/bin/perl -w > > > > > > use Archive::Tar; > > > use Archive::Tar::Streamed; > > > > > > $usage = "$0 archive-name files...\n"; > > > > > > $name = shift or die $usage; > > > @files = @ARGV or die $usage; > > > > > > open $tf,'>',$name or die $!; > > > $tar = Archive::Tar::Streamed->new($tf); > > > $tar->add(@files); > > >
> > > > If you read the source, it uses Archive::Tar's add_files for
> handling
> > multiple files supplied to ->add. > > > > This means you need to break up files into separate calls to ->add
> to
> > control how many files will be resident in memory at any one time.
> > In my example above I have only one file. > How can I split it into separate calls to ->add? > > >
You would have to manually read chunks of the file into Archive::Tar::File objects using the data constructor pattern, and then pass them one at a time to ->add. But you'd need to repeat the operation on extraction. I think there's an overhead of about 500 bytes per entry, or one block. I agree, it would be nice if this module could write the files in chunks and then rewrite the headers afterwards (to maintain the relationship of Archive::Tar doing the writing). You could maybe walk the archive backwards knowing you split the file into N entries, remove the extra headers, and update the first header with the sum of their sizes. At the moment the only options are: use Archive::Tar::File, split the data into manageable sizes outside of perl, or use the tar command directly (to use it from perl, there's a nice wrapper on cpan: Archive::Tar::Wrapper). I don't think we'll ever have a truly non-memory-resident tar implementation in perl based on Archive::Tar. To be honest, Archive::Tar itself needs to be rewritten or succeeded by something else.
Subject: Re: [rt.cpan.org #64712] archive is memory resident and not disk based
Date: Wed, 29 Aug 2012 02:25:09 +0200
To: Anthony J Lucas via RT <bug-Archive-Tar-Streamed [...] rt.cpan.org>
From: Ulli Horlacher <framstag [...] rus.uni-stuttgart.de>
On Tue 2012-08-28 (19:34), Anthony J Lucas via RT wrote: Show quoted text
> or use the tar command directly (to use it from perl, there's a nice > wrapper on cpan: Archive::Tar::Wrapper).
Thanks! That's an alternative! -- Ullrich Horlacher Informationssysteme und Serverbetrieb Rechenzentrum IZUS/TIK E-Mail: horlacher@rus.uni-stuttgart.de Universitaet Stuttgart Tel: ++49-711-68565868 Allmandring 30a Fax: ++49-711-682357 70550 Stuttgart (Germany) WWW: http://www.rus.uni-stuttgart.de/ REF: <rt-3.8.HEAD-9561-1346196855-1517.64712-6-0@rt.cpan.org>