Subject: | catpath/catfile don't handle unicode concatenation |
I'll write a patch if you like, but I wanted to run the idea past you
first.
The following is broken:
mkdir("\x{100}");
chdir("\x{100}");
my $dir= cwd();
my $absfile= File::Spec->catfile($dir, "\x{100}.txt");
open my $handle, "<", $absfile; # fails because path string is
corrupt
The path string is corrupt because the bytes of the directory name were
promoted to codepoints when the string was concatenated with the unicode
string. i.e. Perl tried to open the file
"..../\xC3\x84\xC2\x80/\xC4\x80.txt"
This is not File::Spec's fault, but it is something File::Spec is in a
perfect position to help fix. (and give the community yet another
reason to "always use File::Spec on filenames")
It is *never* correct to call utf8::upgrade() on a file name, yet this
is what happens during that call to catfile(). Perl will call
utf8::encode() and not utf8::downgrade() when you use that unicode
string in a filename context, so the non-unicode arguments needs to be
utf8::decode'd, or the unicode arguments needs to be utf8::encode'd. Since not all UNIX filenames are valid unicode, I think the correct
thing to do is utf8::encode all unicode arguments.
Or, if you wanted to match Perl's natural behavior a little more
closely, you could check for any unicode arguments, and if found, call
"decode" on any non-unicode arguments, and then if any decode fails,
call "encode" on all the unicode arguments instead. The return value
would either be a valid byte array, or a valid unicode string which
would become a valid byte array if used in a filename context.
I'm only suggesting this change for File::Spec::Unix, since I know less
about Perl's unicode handling on other platforms.