Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 1867
Status: resolved
Priority: 0/
Queue: XML-RSS

People
Owner: Nobody in particular
Requestors: miyagawa [...] edge.co.jp
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 1.01



Subject: parse() and as_string() XML escaping issue
XML::RSS's parse() (or parsefile()) decodes XML entities (like &amp;), but then after adding another item and call as_string(), the decoded XML entities wouldn't encoded again in output XML. Try this and you get a broken XML. Any known workaround for it? (I use RSS version 0.91 for simplicity, it still happens when I use 1.0) use strict; use XML::RSS; my $rss = XML::RSS->new(encoding => 'UTF-8', version => 0.91); $rss->parse(<<RSS); <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91"> <channel> <title>example.com</title> <link>http://example.com/</link> <description>bla &amp; blah</description> <language>ja</language> <item> <title>foo &amp; bar</title> <link>http://example.com/</link> </item> </channel> </rss> RSS ; print $rss->as_string;
[MIYAGAWA - Wed Dec 11 01:26:04 2002]: Show quoted text
> XML::RSS's parse() (or parsefile()) decodes XML entities (like &amp;), > but then after adding another item and call as_string(), the decoded > XML entities wouldn't encoded again in output XML.
This might be a Hard issue to completely solve. Let's say you encode every plain ampersand as ampersand-amp;. That's all well and good; but do you stop there? What if you had an ampersand as part of an entity, like ampersand-eacute;? Should that be escaped? It is not a valid XML entity by default, but it could be one for that given XML application. I guess if you limit to ampersands that are NOT part of an entity, it could be reasonable to escape those. It's a fairly simple regex, one used in a few places in Slash.
On Mer. 11 Dic. 2002 01:26:04, MIYAGAWA wrote: Show quoted text
> XML::RSS's parse() (or parsefile()) decodes XML entities (like &amp;), > but then after adding another item and call as_string(), the decoded > XML entities wouldn't encoded again in output XML. > > Try this and you get a broken XML. Any known workaround for it? > (I use RSS version 0.91 for simplicity, it still happens when I use > 1.0) > > use strict; > use XML::RSS; > > my $rss = XML::RSS->new(encoding => 'UTF-8', version => 0.91); > $rss->parse(<<RSS); > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" > "http://my.netscape.com/publish/formats/rss-0.91.dtd"> > <rss version="0.91"> > <channel> > <title>example.com</title> > <link>http://example.com/</link> > <description>bla &amp; blah</description> > <language>ja</language> > <item> > <title>foo &amp; bar</title> > <link>http://example.com/</link> > </item> > </channel> > </rss> > RSS > ; > > print $rss->as_string; >