Subject: | XML::Feed: Atom feeds come out as bytes, but RSS as Unicode |
Date: | Tue, 3 Feb 2009 19:31:00 +0000 |
To: | bug-XML-Feed [...] rt.cpan.org |
From: | Simon McVittie <smcv [...] debian.org> |
XML::Atom has a bizarre API where by default, text is returned as a string of
UTF-8 bytes without the Unicode flag set. XML::RSS::Feed doesn't do this.
To make the output of XML::Feed the same in both cases, XML::Feed should
probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" around each read
access to the XML::Atom object's accessor functions, resulting in a
switch to Unicode output that matches XML::RSS::Feed.
This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom
feeds; it ends up "double-escaping" the entries as they're written into the
cache. For instance, U+8217 closing single quote goes into the cache file as
the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", rather than the correct 3-byte
sequence "\xE2\x80\x99"; the effect is as if the string was encoded as
UTF-8, decoded as Latin-1, then encoded as UTF-8 again.
Simon
Message body not shown because it is not plain text.