Skip Menu |

This queue is for tickets about the XML-SAX-PurePerl CPAN distribution.

Report information
The Basics
Id: 19411
Status: new
Priority: 0/
Queue: XML-SAX-PurePerl

People
Owner: Nobody in particular
Requestors: clinton [...] traveljury.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: PurePerl not setting the utf8 flag
Two RSS feeds, both encoded in ISO-8859-1. Feed One contains: a literal british pound character <title>Extra £2.5m for July bomb victims</title> Feed Two contains : a character entity reference <title>Boots injects &#xA3;3.6m to help area where it closed down factory</title> Feed Two, when parsed, returns a literal pound character (with encode_entities --> &pound;) Feed One, when parsed, returns a UTF8 string which is not marked as such, so encode_entities --> &Acirc;&pound; However, if (for feed One), you parse it, then Encode::decode('utf8',$item->title), it interprets it correctly. Sorry if that is confusing : essentially, it is returning UTF8 characters, but without the utf8 flag set. The libXML parser works fine.