Skip Menu |

This queue is for tickets about the Archive-Tar CPAN distribution.

Report information
The Basics
Id: 75474
Status: resolved
Worked: 30 min
Priority: 0/
Queue: Archive-Tar

People
Owner: BINGOS [...] cpan.org
Requestors: HMBRAND [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.82
Fixed in: (no value)



Subject: Problems with UTF8 in folder- and file-names
Test scripts attached: $ perl test.pl script wanted to create existing folder tmp! $ rm -rf tmp $ perl test.pl 1 tmp/Fügen/00-a.txt 1 tmp/Fügen/01-e.txt 2 tmp/Fügen/02-é.txt 2 tmp/Fügen/03-z.txt 1 tmp/Fügen/04-ż.txt Wide character in IO::Compress::Gzip::write: at /pro/lib/perl5/5.14.1/ Compress/Zlib.pm line 205. The documentation mentions UTF-8 in *data*, but not in file or folder names, and the script adds files as-is.
Subject: test.pl
#!/pro/bin/perl use strict; use warnings; -d "tmp" and die "script wanted to create existing folder tmp!\n"; use Encode qw( decode encode ); use File::Find; use Archive::Tar; binmode STDOUT, ":encoding(utf-8)"; binmode STDERR, ":encoding(utf-8)"; my $dn = encode ("utf8", "tmp/F\x{00fc}gen"); mkdir $_, 0777 for "tmp", $dn; for ( [ "00-a.txt", "a" ], [ "01-e.txt", "e" ], [ "02-\x{00e9}.txt", "\x{00e9}" ], [ "03-z.txt", "\x{017c}" ], [ "04-\x{017c}.txt", "z" ], ) { my ($fn, $data) = @$_; $fn = encode ("utf8", $fn); open my $fh, ">:encoding(utf-8)", "$dn/$fn" or die "Cannot opn $fn: $!\n"; print $fh $data; close $fh; } my @files; find (sub { -f and push @files, decode ("utf8", $File::Find::name); }, "tmp"); my $tar = Archive::Tar->new (); foreach my $f (sort @files) { printf "%4d %s\n", -s $f, $f; $tar->add_files ($f); } $tar->write ("test.tgz", 9);
On Fri Mar 02 09:30:07 2012, HMBRAND wrote: Show quoted text
> Test scripts attached: > > $ perl test.pl > script wanted to create existing folder tmp! > $ rm -rf tmp > $ perl test.pl > 1 tmp/Fügen/00-a.txt > 1 tmp/Fügen/01-e.txt > 2 tmp/Fügen/02-é.txt > 2 tmp/Fügen/03-z.txt > 1 tmp/Fügen/04-ż.txt > Wide character in IO::Compress::Gzip::write: at /pro/lib/perl5/5.14.1/ > Compress/Zlib.pm line 205.
The problem as described was deduced from a bigger project where the file names came from a utf8 encode database. further experiments lead to a solution $tar->add_files (encode ("utf-8", $f)); fixed all. This might be explained in the documentation (under a specific header where file- and foldernames are subject) or might even be done automatically in the add_files () method when the target file is marked utf-8 Show quoted text
> The documentation mentions UTF-8 in *data*, but not in file or folder > names, and the script adds files as-is.
Also proves to be an easy fix: This is valid for 5.8.1 and up. an extra test might make it save for all versions below (but not fix it for those) --8<--- --- Archive/Tar.pm.org 2012-03-02 16:59:23.284609675 +0100 +++ Archive/Tar.pm 2012-03-02 16:59:26.546609520 +0100 @@ -1451,6 +1451,9 @@ sub add_files { next; } + if( utf8::is_utf8( $file )) { + utf8::encode( $file ); + } unless( -e $file || -l $file ) { $self->_error( qq[No such file: '$file'] ); next; -->8---
Applied and new version released. Thanks.