Subject: | uri filter discards multi-byte characters |
I have a web application that produces URIs with unicode characters in
them. Unfortunately, I can't get Template's built-in "uri" filter to
properly escape them. My code looks like this:
[% FOREACH tag = article.tags %]
<a href="/tags/[% tag | uri %]">[% tag %]</a>
[% END %]
If "tag" happens to be "日本語", I want this to be escaped to
"%E6%97%A5%E6%9C%AC%E8%AA%9E". Unfortunately Template just discards the
characters and produces no output.
I'm using Perl 5.8.8 on Debian GNU/Linux (unstable) in the en_US.UTF8
locale. Passing through the unicode (rather, UTF8-encoded unicode)
character as raw bytes works fine.
BTW, the standard URI.pm module does do the escaping right:
$ perl -MURI -e 'print URI->new("http://foo/日本語/")->as_string;'
http://foo/%E6%97%A5%E6%9C%AC%E8%AA%9E/
--
Jonathan Rockway <jrockway@cpan.org>