Skip Menu |

This queue is for tickets about the PathTools CPAN distribution.

Report information
The Basics
Id: 83130
Status: new
Priority: 0/
Queue: PathTools

People
Owner: Nobody in particular
Requestors: NERDVANA [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: catpath/catfile don't handle unicode concatenation
I'll write a patch if you like, but I wanted to run the idea past you first. The following is broken: mkdir("\x{100}"); chdir("\x{100}"); my $dir= cwd(); my $absfile= File::Spec->catfile($dir, "\x{100}.txt"); open my $handle, "<", $absfile; # fails because path string is corrupt The path string is corrupt because the bytes of the directory name were promoted to codepoints when the string was concatenated with the unicode string. i.e. Perl tried to open the file "..../\xC3\x84\xC2\x80/\xC4\x80.txt" This is not File::Spec's fault, but it is something File::Spec is in a perfect position to help fix. (and give the community yet another reason to "always use File::Spec on filenames") It is *never* correct to call utf8::upgrade() on a file name, yet this is what happens during that call to catfile(). Perl will call utf8::encode() and not utf8::downgrade() when you use that unicode string in a filename context, so the non-unicode arguments needs to be utf8::decode'd, or the unicode arguments needs to be utf8::encode'd. Since not all UNIX filenames are valid unicode, I think the correct thing to do is utf8::encode all unicode arguments. Or, if you wanted to match Perl's natural behavior a little more closely, you could check for any unicode arguments, and if found, call "decode" on any non-unicode arguments, and then if any decode fails, call "encode" on all the unicode arguments instead. The return value would either be a valid byte array, or a valid unicode string which would become a valid byte array if used in a filename context. I'm only suggesting this change for File::Spec::Unix, since I know less about Perl's unicode handling on other platforms.