Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 22467
Status: resolved
Priority: 0/
Queue: XML-RSS

People
Owner: Nobody in particular
Requestors: stephen.hall [...] predix.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.11
Fixed in: (no value)



Subject: Incorrect entity encoding in XML output
HTML::Entities::encode_entities_numeric should be used for entity encoding, instead of HTML::Entities::encode_entities. The reason is that XML allows only 5 named entities - & < > " ' - all other entities must be encoded numerically. This means XML::RSS v1.11 produces illegal XML when encoding entities other than the 5 above.
Subject: Re: [rt.cpan.org #22467] Incorrect entity encoding in XML output
Date: Tue, 24 Oct 2006 11:12:21 -0700
To: bug-XML-RSS [...] rt.cpan.org
From: Ask Bjørn Hansen <ask [...] perl.org>
On Oct 23, 2006, at 2:14, Stephen Hall via RT wrote: Show quoted text
> HTML::Entities::encode_entities_numeric should be used for entity > encoding, instead of HTML::Entities::encode_entities. The reason is > that XML allows only 5 named entities - &amp; &lt; &gt; &quot; > &apos; - > all other entities must be encoded numerically. This means XML::RSS > v1.11 produces illegal XML when encoding entities other than the 5 > above.
Doh! I'll make a 1.12 with that fixed. It should be okay to use named entities in CDATA fields, right? - ask Index: lib/XML/RSS.pm =================================================================== --- lib/XML/RSS.pm (revision 7967) +++ lib/XML/RSS.pm (working copy) @@ -2,7 +2,7 @@ use strict; use Carp; use XML::Parser; -use HTML::Entities qw(encode_entities); +use HTML::Entities qw(encode_entities_numeric encode_entities); use vars qw($VERSION $AUTOLOAD $modules $AUTO_ADD); use base qw(XML::Parser); @@ -1684,9 +1684,11 @@ my $encoded_text = ''; while ( $text =~ s/(.*?)(\<\!\[CDATA\[.*?\]\]\>)//s ) { + # we use &named; entities here because it's HTML $encoded_text .= encode_entities($1) . $2; } - $encoded_text .= encode_entities($text); + # we use numeric entities here because it's XML + $encoded_text .= encode_entities_numeric($text); return $encoded_text; }
From: ABH [...] cpan.org
On Tue Oct 24 14:13:08 2006, ask@perl.org wrote: Show quoted text
> > On Oct 23, 2006, at 2:14, Stephen Hall via RT wrote:
Show quoted text
> It should be okay to use named entities in CDATA fields, right?
I committed that patch to r7969.
Subject: Re: [rt.cpan.org #22467] Incorrect entity encoding in XML output
Date: Wed, 25 Oct 2006 16:30:27 +0200
To: <bug-XML-RSS [...] rt.cpan.org>
From: "Stephen Hall" <stephen.hall [...] predix.com>
Yes, named entities inside CDATA are OK. Thanks for the quick fix. - Stephen Show quoted text
----- Original Message ----- From: "ask@perl.org via RT" <bug-XML-RSS@rt.cpan.org> To: <stephen.hall@predix.com> Sent: Tuesday, October 24, 2006 8:13 PM Subject: Re: [rt.cpan.org #22467] Incorrect entity encoding in XML output
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=22467 > > > > On Oct 23, 2006, at 2:14, Stephen Hall via RT wrote: >
>> HTML::Entities::encode_entities_numeric should be used for entity >> encoding, instead of HTML::Entities::encode_entities. The reason is >> that XML allows only 5 named entities - &amp; &lt; &gt; &quot; >> &apos; - >> all other entities must be encoded numerically. This means XML::RSS >> v1.11 produces illegal XML when encoding entities other than the 5 >> above.
> > Doh! I'll make a 1.12 with that fixed. > > It should be okay to use named entities in CDATA fields, right? > > > - ask > > > > Index: lib/XML/RSS.pm > =================================================================== > --- lib/XML/RSS.pm (revision 7967) > +++ lib/XML/RSS.pm (working copy) > @@ -2,7 +2,7 @@ > use strict; > use Carp; > use XML::Parser; > -use HTML::Entities qw(encode_entities); > +use HTML::Entities qw(encode_entities_numeric encode_entities); > use vars qw($VERSION $AUTOLOAD $modules $AUTO_ADD); > use base qw(XML::Parser); > @@ -1684,9 +1684,11 @@ > my $encoded_text = ''; > while ( $text =~ s/(.*?)(\<\!\[CDATA\[.*?\]\]\>)//s ) { > + # we use &named; entities here because it's HTML > $encoded_text .= encode_entities($1) . $2; > } > - $encoded_text .= encode_entities($text); > + # we use numeric entities here because it's XML > + $encoded_text .= encode_entities_numeric($text); > return $encoded_text; > } > > >