Skip Menu |

This queue is for tickets about the XML-Feed CPAN distribution.

Report information
The Basics
Id: 43004
Status: resolved
Priority: 0/
Queue: XML-Feed

People
Owner: DAVECROSS [...] cpan.org
Requestors: smcv [...] debian.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: XML::Feed: Atom feeds come out as bytes, but RSS as Unicode
Date: Tue, 3 Feb 2009 19:31:00 +0000
To: bug-XML-Feed [...] rt.cpan.org
From: Simon McVittie <smcv [...] debian.org>
XML::Atom has a bizarre API where by default, text is returned as a string of UTF-8 bytes without the Unicode flag set. XML::RSS::Feed doesn't do this. To make the output of XML::Feed the same in both cases, XML::Feed should probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" around each read access to the XML::Atom object's accessor functions, resulting in a switch to Unicode output that matches XML::RSS::Feed. This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom feeds; it ends up "double-escaping" the entries as they're written into the cache. For instance, U+8217 closing single quote goes into the cache file as the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", rather than the correct 3-byte sequence "\xE2\x80\x99"; the effect is as if the string was encoded as UTF-8, decoded as Latin-1, then encoded as UTF-8 again. Simon
Download signature.asc
application/pgp-signature 155b

Message body not shown because it is not plain text.

I too have the same problem. And setting $XML::Atom::ForceUnicode = 1; fixes this for me. But I'm afraid that it's a global variable and I can't set it in my module AnyEvent::Feed which uses XML::Feed. Greetings, Robin
Hmm, I'm not entirely sure what the best way to handle this is - setting ForceUnicode is kind of a nuclear option which could screw up other modules in, say, a mod_perl environment. I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
On Mon Nov 16 21:02:41 2009, SIMONW wrote: Show quoted text
> Hmm, I'm not entirely sure what the best way to handle this is - setting > ForceUnicode is kind of a nuclear option which could screw up other > modules in, say, a mod_perl environment. > > I'm talking to Tatsuhiko Miyagawa about it and I'll get back to you.
I discovered this solution myself. I'd love to see XML::Atom have an object attribute to force decoding to utf8. Frankly, it should be enabled by default. Best, David
Hi all, I've been bitten by this bug myself now when trying to combine my blogs.perl.org's blog feed, which is only provided in Atom (why??), into the rest of the feeds. The ForceUnicode setting workaround that is described in this thread works nicely, but there should be a more permanent solution. Regards, -- Shlomi Fish
On Tue Feb 03 14:32:07 2009, smcv@debian.org wrote: Show quoted text
> XML::Atom has a bizarre API where by default, text is returned as a > string of UTF-8 bytes without the Unicode flag set. XML::RSS::Feed > doesn't do this. > > To make the output of XML::Feed the same in both cases, XML::Feed > should probably use "{ local $XML::Atom::ForceUnicode = 1; ... }" > around each read access to the XML::Atom object's accessor functions, > resulting in a switch to Unicode output that matches XML::RSS::Feed. > > This bug breaks IkiWiki <http://ikiwiki.info/> when aggregating Atom > feeds; it ends up "double-escaping" the entries as they're written > into the cache. For instance, U+8217 closing single quote goes into > the cache file as the 6-byte sequence "\xC3\xA2\xC2\x80\xC2\x99", > rather than the correct 3-byte sequence "\xE2\x80\x99"; the effect is > as if the string was encoded as UTF-8, decoded as Latin-1, then > encoded as UTF-8 again. > > Simon
Does it make sense to discuss this here? Isn't it a bug in XML::Atom? Or am I misunderstanding? Dave...
Subject: Re: [rt.cpan.org #43004] XML::Feed: Atom feeds come out as bytes, but RSS as Unicode
Date: Thu, 24 Nov 2011 12:01:09 +0000
To: Dave Cross via RT <bug-XML-Feed [...] rt.cpan.org>
From: Simon McVittie <smcv [...] debian.org>
On Thu, 24 Nov 2011 at 06:37:43 -0500, Dave Cross via RT wrote: Show quoted text
> Does it make sense to discuss this here? Isn't it a bug in XML::Atom? > > Or am I misunderstanding?
I agree that this needs discussion with the author of XML::Atom. I don't know how you Cc people "correctly" in RT, it's not a bug tracker I'm particularly familiar with. As far as I'm concerned, the bug in X::F is that it doesn't produce the same data type for RSS and Atom feeds (breaking encapsulation), and the underlying bugs in X::A that make it hard for X::F to do the right thing are: 1) produces a byte-string of UTF-8, rather than a Unicode string, by default (might not be considered to be a bug, since it's documented in XML::Atom::Feed; or might be considered to be a bug but unfixable, since that would be an API break) 2) can only be directed to produce Unicode by setting a global variable (this is an API design problem, rather than not behaving as documented) Three possible solutions: * If (1) is considered to be a bug, make XML::Atom::ForceUnicode the default, and XML::Feed doesn't need any changes; requires changes to X::A only. * If (1) is as designed or is unfixable, fix (2) instead (e.g. add $feed->unicode(1) setter) and then change XML::Feed to use it; requires changes to both X::A and X::F. I'd be inclined to say this one is the most correct. * If (1) is as designed, postprocess the XML::Atom output through Encode::decode('utf-8', $bytes) in XML::Feed; requires changes to X::F only, but will break if (1) is changed in a later version of X::A. Which one is correct is up to you and the author of XML::Atom. For now, IkiWiki sets "local $XML::Atom::ForceUnicode = 1" around each invocation of XML::Feed, because we know that it's single-threaded, so the usual problems with global variables are less of a concern. I realise this would be unacceptable in a library, though. S
Ticket migrated to github as https://github.com/davorg/xml-feed/issues/44