Bug #2472 for XML-RSS: encoding support is broken

Thu May 01 17:21:16 2003 Guest - Ticket created

Subject:

encoding support is broken

I'm running version 1.02 of the module and am having trouble with the encoding support. The encode_text function seems just to convert ampersands if they are not in front of an entity (eg. it won't convert &). The correct (and much simpler) thing to do is always to encode the ampersand. The current behavior will, for instance, break html character data by not encoding entities because the xml parser on the other end will decode those entities. For example, suppose I have an rss item with a description of: The ampersand entity is '&amp;'. When I encode that title using the current code, I get: <description>The ampersand entity is '&amp;'.</description> When I parse that xml with a correct parser, I get the following as the description cdata: The ampersand entity is '&'. By not encoding the entity, you've broken the string. More importantly, you have a large list of entitites that you will not replace, but the only standard xml entities are: amp, lt, gt, apos, quot Every other entity must be declared before it can be used. So, when I pass a character data value with, for example, an   in it and you do not encode the   into &nbsp; but instead just leave it as is, the parser that tries to read your output, sees the undeclared (and therefore illegal) entity  , and throws a fatal error. This is in fact the bahavior that caused me to look at encoding function to see what it was doing, since expat correctly refused to parse a ' ' in the output from the module. There is a clear statement of the proper way to encode character data at: http://www.w3.org/TR/REC-xml#dt-chardata The short of it is that you must always encode '&' and '<', and you must always encode '>' when it appears in the string ']]>' but does not mark the end of a CDATA section.

Tue Apr 20 21:38:30 2004 KELLAN [...] cpan.org - Taken

Sat Nov 11 03:29:06 2006 ABH [...] cpan.org - Correspondence added

On Thu May 01 17:21:16 2003, guest wrote: Show quoted text

> I'm running version 1.02 of the module and am having trouble with the > encoding support. The encode_text function seems just to convert > ampersands if they are not in front of an entity (eg. it won't > convert &). The correct (and much simpler) thing to do is > always to encode the ampersand. The current behavior will, for > instance, break html character data by not encoding entities > because the xml parser on the other end will decode those entities. > For example, suppose I have an rss item with a description of: > > The ampersand entity is '&amp;'. > > When I encode that title using the current code, I get: > > <description>The ampersand entity is '&amp;'.</description> > > When I parse that xml with a correct parser, I get the following as > the description cdata: > > The ampersand entity is '&'. > > By not encoding the entity, you've broken the string. > > More importantly, you have a large list of entitites that you will not > replace, but the only standard xml entities are: > > amp, lt, gt, apos, quot > > Every other entity must be declared before it can be used. So, when I > pass a character data value with, for example, an   in it and > you do not encode the   into &nbsp; but instead just leave > it as is, the parser that tries to read your output, sees the > undeclared (and therefore illegal) entity  , and throws a > fatal error. This is in fact the bahavior that caused me to look > at encoding function to see what it was doing, since expat > correctly refused to parse a ' ' in the output from the > module. > > There is a clear statement of the proper way to encode character data > at: > > http://www.w3.org/TR/REC-xml#dt-chardata > > The short of it is that you must always encode '&' and '<', and you > must always encode '>' when it appears in the string ']]>' but does > not mark the end of a CDATA section.

Hi, This should be fixed since v1.12. - ask

Sat Nov 11 03:29:08 2006 The RT System itself - Status changed from 'new' to 'open'

Sat Nov 11 03:29:09 2006 ABH [...] cpan.org - Status changed from 'open' to 'resolved'