Skip Menu |

This queue is for tickets about the XML-Feed CPAN distribution.

Report information
The Basics
Id: 42554
Status: resolved
Priority: 0/
Queue: XML-Feed

People
Owner: Nobody in particular
Requestors: wchunhao [...] cs.nctu.edu.tw
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.41
Fixed in: (no value)



Subject: escape amps and add fault tolerance
I found some RSS feeds may use HTML special chars in the XML fields, in particular, they may use '&'(amps) in the title field. Such items cannot be parsed; what's worse, a small error in one item will cause the whole document corrupted. I've tried the same document on Google RSS reader and it worked fine. Since we cannot expect every RSS to be well-formed, I would suggest: 1. treat unrecognized tokens that begin with '&'(amps), '<'(le) and '>'(gt) as normal text. 2. if some item cannot be parsed, then ignore it and continue from the next one (the corrupted item may be returned). Thanks a lot!
Subject: Re: [rt.cpan.org #42554] escape amps and add fault tolerance
Date: Tue, 20 Jan 2009 02:22:44 -0800
To: bug-XML-Feed [...] rt.cpan.org
From: "Tatsuhiko Miyagawa" <miyagawa [...] gmail.com>
On Tue, Jan 20, 2009 at 2:18 AM, Chin-Hao Wu via RT <bug-XML-Feed@rt.cpan.org> wrote: Show quoted text
> I found some RSS feeds may use HTML special chars in the XML fields, in > particular, they may use '&'(amps) in the title field. Such items cannot > be parsed; what's worse, a small error in one item will cause the whole > document corrupted. I've tried the same document on Google RSS reader > and it worked fine. Since we cannot expect every RSS to be well-formed,
I wouldn't call it "fault tolerant". If modules like XML::RSS or XML::Feed parses non well-formed XML without emitting an error, that's a bug. Take a look at XML::Liberal on CPAN. You could use the module to preprocess broken XML into well-formed one and then pass it to modules like XML::RSS, XML::Atom or XML:::Feed. http://search.cpan.org/dist/XML-Liberal/ -- Tatsuhiko Miyagawa
Also, I think this would be the role of the underlying XML::RSS library rather than XML::Feed. I understand the whole Postel's law of "Be liberal in what you accept and strict in what you send" but I'm inclined to agree with Tatsuhiko and close this. If there's anything XML::Feed can do to help passing in options to use more liberal XML parsers to underlying libraries then please feel free to reopen the ticket and add details. Thanks, and sorry we weren't able to help. Simon