Skip Menu |

This queue is for tickets about the PathTools CPAN distribution.

Report information
The Basics
Id: 107856
Status: open
Priority: 0/
Queue: PathTools

People
Owner: Nobody in particular
Requestors: HAKONH [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: abs2rel problem with unicode paths
Using Perl version 5.20.1 on a Linux laptop. When running the following script: use feature qw(say); use strict; use utf8; use warnings; use Env qw(HOME); use File::Spec::Functions qw(abs2rel); my $tdir = 'ø'; my $path = "$HOME/$tdir/b/æ"; my $base = "$HOME/$tdir"; chdir $base; binmode STDOUT, ":utf8"; say abs2rel( $path, $base ); say abs2rel( $path ); I get output: b/æ ../ø/b/æ Expected output: b/æ ../ø/b/æ Assumed problem: Line 409 in Unix.pm ( https://metacpan.org/source/SMUELLER/PathTools-3.47/lib/File/Spec/Unix.pm ) $base = $self->_cwd() unless defined $base and length $base; calls Cwd::getcwd() which returns bytes, this causes $base not to be recognized as a prefix for $path.. Fix: _cwd() should return unicode in this case.
Show quoted text
> > Expected output: > > b/æ > ../ø/b/æ >
Sorry that was a typo, should be: Expected output: b/æ b/æ
On 2015-10-19 08:13:57, HAKONH wrote: Show quoted text
> $base = $self->_cwd() unless defined $base and length $base; > > calls Cwd::getcwd() which returns bytes, this causes $base not to be > recognized as a prefix for $path.. > > Fix: _cwd() should return unicode in this case.
I'm not sure that the code should do any utf8 decoding of filenames, at least not without being requested too -- there is no standardization for filesystems to use a specific encoding (some use UTF-16, some use latin1, some use utf-8..) and there is no way for us to tell which one is in use.
Subject: Re: [rt.cpan.org #107856] abs2rel problem with unicode paths
Date: Mon, 19 Oct 2015 14:04:06 -0500
To: "bug-PathTools [...] rt.cpan.org" <bug-PathTools [...] rt.cpan.org>
From: Ken Williams <kwilliams [...] cpan.org>
Filesystems use encodings at all? I thought they just used byte sequences. On Mon, Oct 19, 2015 at 11:13 AM, Karen Etheridge via RT < bug-PathTools@rt.cpan.org> wrote: Show quoted text
> Queue: PathTools > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=107856 > > > On 2015-10-19 08:13:57, HAKONH wrote: >
> > $base = $self->_cwd() unless defined $base and length $base; > > > > calls Cwd::getcwd() which returns bytes, this causes $base not to be > > recognized as a prefix for $path.. > > > > Fix: _cwd() should return unicode in this case.
> > I'm not sure that the code should do any utf8 decoding of filenames, at > least not without being requested too -- there is no standardization for > filesystems to use a specific encoding (some use UTF-16, some use latin1, > some use utf-8..) and there is no way for us to tell which one is in use. >
Maybe the function should then croak if the user uses the one-argument call and $path has the UTF-8 flag set? Since in this case unexpected results may occur as shown.. Accordingly, a workaround seems to be to encode $path before passing it on: my $encode_flags = Encode::FB_CROAK | Encode::LEAVE_SRC; $path = Encode::encode( 'UTF-8', $path, $encode_flags ); say Encode::decode( 'UTF-8', abs2rel( $path ), $encode_flags ); Ouput: b/æ