Bug #14260 for HTML-Tree: HTML::Element's _xml_escape should be left to a filter that knows that the encodings involved are

Mon Aug 22 10:54:56 2005 NUFFIN [...] cpan.org - Ticket created

Subject:

HTML::Element's _xml_escape should be left to a filter that knows that the encodings involved are

_xml_escape as applied by as_XML, called by Class::DBI::AsForm was causing data corruption during round tripping when unicode was involved. My workaround was to assign an empty sub to _xml_escape. My guess is that data was decoded as latin 1 or something by the browser (Despite meta http-equiv specifying utf-8, as well as the server agreeing with it WRT to the Content-Type header). This data was then sent back to the server, but it was unicode reinterpreted as latin 1, converted into unicode, so wide characters were made into accented narrow ones from the latin 1 space. Anyway, my point is that since HTML::Element has no control over where it's output data will be fed to eventually this should be an optional feature, that can be easily disabled or replaced, where another filter to replace unprintable characters can be applied to the string resulting from 'as_XML' by the output handler (for example a catalyst plugin, that hooks on output, or a special perl io mode). Ciao, and thanks!

Sat Aug 05 15:39:54 2006 somwhere [...] confuzzled.lu - Correspondence added

From:

somewhere [...] confuzzled.lu

Hi, The same function seems to be used in HTML::Widget in the process of filling fields with values. If there are non standart letters however (like éààé), all the letters get converted to their HTML::Entity counterpart. (I haven't looked at the both modules, so I don't know if the Html::widget author is using your module correctly.) As mentioned on http://lists.rawmode.org/pipermail/catalyst/2006-May/007646.html the following _xml_escape function would solve the problem sub _xml_escape { # DESTRUCTIVE (a.k.a. "in-place") foreach my $x (@_) { $x =~ s~([<&>])~'&#'.(ord($1)).';'~seg; } return; } Thibaut On Mon Aug 22 10:54:56 2005, NUFFIN wrote: Show quoted text

> _xml_escape as applied by as_XML, called by Class::DBI::AsForm was > causing data corruption during round tripping when unicode was > involved. > > My workaround was to assign an empty sub to _xml_escape. > > > My guess is that data was decoded as latin 1 or something by the > browser (Despite meta http-equiv specifying utf-8, as well as the > server agreeing with it WRT to the Content-Type header). > > This data was then sent back to the server, but it was unicode > reinterpreted as latin 1, converted into unicode, so wide > characters were made into accented narrow ones from the latin 1 > space. > > Anyway, my point is that since HTML::Element has no control over where > it's output data will be fed to eventually this should be an > optional feature, that can be easily disabled or replaced, where > another filter to replace unprintable characters can be applied to > the string resulting from 'as_XML' by the output handler (for > example a catalyst plugin, that hooks on output, or a special perl > io mode). > > Ciao, and thanks!

Sat Aug 05 15:39:55 2006 The RT System itself - Status changed from 'new' to 'open'

Sat Nov 11 14:51:52 2006 PETEK [...] cpan.org - Correspondence added

_xml_escape now only escapes five values. Four are <, >, ' and ", and they are always escaped. The fifth is &, but it is only escaped if it is not part of an already existing escape. The escapes recognized are &[a-z0-9]+; (e.g. <) and &#\d+; (e.g >). This allows, for example,   to pass through unharmed so that an intended non-breaking space doesn't get double-escaped to &nbsp; and produce unexpected behavior. Added test t/escape.t to prove this behavior.

Sat Nov 11 14:51:53 2006 PETEK [...] cpan.org - Status changed from 'open' to 'resolved'

Sat Nov 11 14:53:29 2006 PETEK [...] cpan.org - Correspondence added

This fix will be released to CPAN this weekend as part of the Chicago Hackathon.

Sat Nov 11 14:53:30 2006 The RT System itself - Status changed from 'resolved' to 'open'

Sat Nov 11 18:12:03 2006 PETEK [...] cpan.org - Status changed from 'open' to 'stalled'

Sun Nov 12 12:24:38 2006 PETEK [...] cpan.org - Status changed from 'stalled' to 'resolved'

Sun Nov 12 12:24:39 2006 PETEK [...] cpan.org - Fixed in 3.22 added