Subject: | CGI::Util::escape vs. %uXXXX |
I've already sent you this bug report and patch as
Message-Id: <C53A4BC1-7AA4-46BF-A8B8-D5EC7AD011D0@dan.co.jp>
but the patch got somewhat garbled so I am repeating this via RT.
LDS,
Hi. This is Dan. Long time no see. Now I am in the middle of the post-YAPC::Asia::2007
hackerthon.
I found that the way CGI.pm handles %uXXXX notation is wrong in two ways.
1. UTF-8 flags. Should be off (like %XX) but the current implementation leaves it on.
2. Surrogate pair. Perl 5.8.x now officially outlaws chr($ord) where $ord is whichever half of
the surrogate pair. Because of that. %u just fails when the source string contains the
escaped surrogate pair.
% perl -MCGI -le 'print CGI->new->param("q")' q=%u0061%u5F3E
Wide character in print at -e line 1.
a弾
% perl -MCGI -le 'print CGI->new->param("q")' q=%u0061%u5F3E%uD869%uDEB2
My patch below fix that.
% perl -Mblib -MCGI -le 'print CGI->new->param("q")' q=%u0061%u5F3E%uD869%uDEB2
a弾𪚲
Yours,
Dan the Faithful User of CGI.pm
Subject: | cgi-util.pat |
Message body not shown because it is not plain text.