Subject: | Unescaping UTF-8 file URLs doesn't work with "use encoding" |
The script:
| use encoding 'utf8';
| use strict;
| use warnings;
| use Data::Dumper;
| use URI::file;
| my $u = URI->new
('file:///home/tim/Videos/Nerdist%20Podcast/Nerdist%20Podcast:%20Live%20
@%20Largo%20w%20%E2%80%93%20%20Adam%20Savage!%20(%2310).mp3');
| print Dumper $u->file ();
yields with perl 5.14.3 under Linux (Fedora 16):
| $VAR1 = "/home/tim/Videos/Nerdist Podcast/Nerdist Podcast: Live \@
Largo w \x{fffd}\x{fffd}\x{fffd} Adam Savage! (#10).mp3";
(Note the three "\x{fffd}".)
I believe this is due to URI::Escape::uri_unescape()'s:
| [...]
| s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
| [...]
where (assumption:) not a concatenation of bytes is constructed that is
then converted, but each byte is tried to be converted individually.
Unfortunately, thinking about Perl's UTF-8 magic makes my brain hurt, so
I couldn't confirm this :-).