Skip Menu |

This queue is for tickets about the XML-Feed CPAN distribution.

Report information
The Basics
Id: 44899
Status: resolved
Priority: 0/
Queue: XML-Feed

People
Owner: Nobody in particular
Requestors: dave [...] dave.org.uk
Cc: SHLOMIF [...] cpan.org
AdminCc:

Bug Information
Severity: Important
Broken in: 0.42
Fixed in: (no value)



Subject: Information lost in convert process
There seems to be a problem in the conversion from RSS to Atom. The attached test demonstrates it. In the original RSS feed, the entry contains HTML formatting. When I convert the feed to Atom that formatting vanishes. Let me know if you need any more details.
Subject: rss.xml
<?xml version="1.0" encoding="ISO-8859-1"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" > <channel rdf:about="http://use.perl.org/~davorg/journal/"> <title>davorg's Journal</title> <link>http://use.perl.org/~davorg/journal/</link> <description>davorg's use Perl Journal</description> <dc:language>en-us</dc:language> <dc:rights>use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners.</dc:rights> <dc:date>2009-04-09T12:53:36+00:00</dc:date> <dc:publisher>pudge</dc:publisher> <dc:creator>pudge@perl.org</dc:creator> <dc:subject>Technology</dc:subject> <syn:updatePeriod>hourly</syn:updatePeriod> <syn:updateFrequency>1</syn:updateFrequency> <syn:updateBase>1970-01-01T00:00+00:00</syn:updateBase> <items> <rdf:Seq> <rdf:li rdf:resource="http://use.perl.org/~davorg/journal/38730?from=rss" /> </rdf:Seq> </items> <image rdf:resource="http://use.perl.org/images/topics/useperl.gif" /> </channel> <image rdf:about="http://use.perl.org/images/topics/useperl.gif"> <title>davorg's Journal</title> <url>http://use.perl.org/images/topics/useperl.gif</url> <link>http://use.perl.org/~davorg/journal/</link> </image> <item rdf:about="http://use.perl.org/~davorg/journal/38730?from=rss"> <title>Task::Kensho RPMs</title> <link>http://use.perl.org/~davorg/journal/38730?from=rss</link> <description>&lt;p&gt;One of the first concrete outputs from the &lt;a href="http://www.enlightenedperl.org/"&gt;Enlightened Perl Organisation&lt;/a&gt; has been &lt;a href="http://search.cpan.org/dist/Task-Kensho/"&gt;Task::Kensho&lt;/a&gt; - a CPAN module which exists to list a number of other CPAN modules that modern Perl programmers should consider using. if you install Task::Kensho then all of the included modules will automatically be pulled down from CPAN and installed.&lt;/p&gt;&lt;p&gt;I don't install my modules from CPAN. As I live in the Red Hat world, I like to install RPMs of modules. And I build RPMs for modules that aren't already available in that format (and then I &lt;a href="http://rpm.mag-sol.com/"&gt;make them available to everyone&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;So last night I created an RPM for Task::Kensho. This also involved building RPMs for about half of the modules it include which didn't already exist as RPMs in the standard repostories. Those RPMs are now available from &lt;a href="http://rpm.mag-sol.com/"&gt;my repository&lt;/a&gt; so installing them all could be as simple as &lt;tt&gt;sudo yum install perl-Task-Kensho&lt;/tt&gt;. Of course, you can also install individual packages using the appropriate &lt;tt&gt;yum&lt;/tt&gt; command.&lt;/p&gt;&lt;p&gt;Currently the RPMs are only available for Fedora 10. I'll build versions for Centos 5 over the next couple of days.&lt;/p&gt;</description> <dc:creator>davorg</dc:creator> <dc:date>2009-03-31T08:02:06+00:00</dc:date> <dc:subject>journal</dc:subject> </item> </rdf:RDF>
Subject: convert.t
use Test::More 'no_plan'; use XML::Feed; my $rss = XML::Feed->parse('rss.xml'); isa_ok($rss, 'XML::Feed::Format::RSS'); my $rss_entry = ($rss->entries)[0]; isa_ok($rss_entry, 'XML::Feed::Entry::Format::RSS'); my $rss_content = $rss_entry->content; isa_ok($rss_content, 'XML::Feed::Content'); is($rss_content->type, 'text/html', 'Correct content type'); like($rss_content->body, qr(<|&lt;), 'Contains HTML tags'); my $atom = $rss->convert('Atom'); isa_ok($atom, 'XML::Feed::Format::Atom'); my $atom_entry = ($atom->entries)[0]; isa_ok($atom_entry, 'XML::Feed::Entry::Format::Atom'); my $atom_content = $atom_entry->content; isa_ok($atom_content, 'XML::Feed::Content'); is($atom_content->type, 'text/html', 'Correct content type'); like($atom_content->body, qr(<|&lt;), 'Contains HTML tags');
I got bit by this bug too. +1 on it being fixed.
On Thu Apr 09 10:09:51 2009, DAVECROSS wrote: Show quoted text
> There seems to be a problem in the conversion from RSS to Atom. The > attached test demonstrates it. In the original RSS feed, the entry > contains HTML formatting. When I convert the feed to Atom that > formatting vanishes. > > Let me know if you need any more details.
Attached is a patch that fixes the problem against the svn trunk. Please apply it. I reworked Dave's testcase into a patch and fixed the problem. Regards, -- Shlomi Fish
Index: t/samples/rss10-davorg-journal.xml =================================================================== --- t/samples/rss10-davorg-journal.xml (revision 0) +++ t/samples/rss10-davorg-journal.xml (revision 0) @@ -0,0 +1,48 @@ +<?xml version="1.0" encoding="ISO-8859-1"?> + +<rdf:RDF + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" + xmlns="http://purl.org/rss/1.0/" + xmlns:slash="http://purl.org/rss/1.0/modules/slash/" + xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" + xmlns:dc="http://purl.org/dc/elements/1.1/" + xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" + xmlns:admin="http://webns.net/mvcb/" +> + +<channel rdf:about="http://use.perl.org/~davorg/journal/"> +<title>davorg's Journal</title> +<link>http://use.perl.org/~davorg/journal/</link> +<description>davorg's use Perl Journal</description> +<dc:language>en-us</dc:language> +<dc:rights>use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners.</dc:rights> +<dc:date>2009-04-09T12:53:36+00:00</dc:date> +<dc:publisher>pudge</dc:publisher> +<dc:creator>pudge@perl.org</dc:creator> +<dc:subject>Technology</dc:subject> +<syn:updatePeriod>hourly</syn:updatePeriod> +<syn:updateFrequency>1</syn:updateFrequency> +<syn:updateBase>1970-01-01T00:00+00:00</syn:updateBase> +<items> + <rdf:Seq> + <rdf:li rdf:resource="http://use.perl.org/~davorg/journal/38730?from=rss" /> + </rdf:Seq> +</items> +<image rdf:resource="http://use.perl.org/images/topics/useperl.gif" /> +</channel> + +<image rdf:about="http://use.perl.org/images/topics/useperl.gif"> +<title>davorg's Journal</title> +<url>http://use.perl.org/images/topics/useperl.gif</url> +<link>http://use.perl.org/~davorg/journal/</link> +</image> + +<item rdf:about="http://use.perl.org/~davorg/journal/38730?from=rss"> +<title>Task::Kensho RPMs</title> +<link>http://use.perl.org/~davorg/journal/38730?from=rss</link> +<description>&lt;p&gt;One of the first concrete outputs from the &lt;a href="http://www.enlightenedperl.org/"&gt;Enlightened Perl Organisation&lt;/a&gt; has been &lt;a href="http://search.cpan.org/dist/Task-Kensho/"&gt;Task::Kensho&lt;/a&gt; - a CPAN module which exists to list a number of other CPAN modules that modern Perl programmers should consider using. if you install Task::Kensho then all of the included modules will automatically be pulled down from CPAN and installed.&lt;/p&gt;&lt;p&gt;I don't install my modules from CPAN. As I live in the Red Hat world, I like to install RPMs of modules. And I build RPMs for modules that aren't already available in that format (and then I &lt;a href="http://rpm.mag-sol.com/"&gt;make them available to everyone&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;So last night I created an RPM for Task::Kensho. This also involved building RPMs for about half of the modules it include which didn't already exist as RPMs in the standard repostories. Those RPMs are now available from &lt;a href="http://rpm.mag-sol.com/"&gt;my repository&lt;/a&gt; so installing them all could be as simple as &lt;tt&gt;sudo yum install perl-Task-Kensho&lt;/tt&gt;. Of course, you can also install individual packages using the appropriate &lt;tt&gt;yum&lt;/tt&gt; command.&lt;/p&gt;&lt;p&gt;Currently the RPMs are only available for Fedora 10. I'll build versions for Centos 5 over the next couple of days.&lt;/p&gt;</description> +<dc:creator>davorg</dc:creator> +<dc:date>2009-03-31T08:02:06+00:00</dc:date> +<dc:subject>journal</dc:subject> +</item> +</rdf:RDF> Property changes on: t/samples/rss10-davorg-journal.xml ___________________________________________________________________ Added: svn:eol-style + native Index: t/16-convert.t =================================================================== --- t/16-convert.t (revision 0) +++ t/16-convert.t (revision 0) @@ -0,0 +1,67 @@ +use strict; +use warnings; + +use Test::More tests => 12; + +use XML::Feed; +use File::Spec; + + +{ + my $rss = XML::Feed->parse( + File::Spec->catfile(File::Spec->curdir(), + "t", "samples", "rss10-davorg-journal.xml" + ) + ); + + # TEST + isa_ok($rss, 'XML::Feed::Format::RSS'); + my $rss_entry = ($rss->entries)[0]; + + # TEST + isa_ok($rss_entry, 'XML::Feed::Entry::Format::RSS'); + + + my $rss_content = $rss_entry->content; + + # TEST + isa_ok($rss_content, 'XML::Feed::Content'); + + # TEST + is($rss_content->type, 'text/html', 'Correct content type'); + + # TEST + like($rss_content->body, qr(<|&lt;), 'Contains HTML tags'); + + # TEST + like($rss_content->body, + qr{\A\Q<p>One of the first concrete outputs from the <a href="http://www.enlightenedperl.org/">Enlightened Perl Organisation</a>\E}, + 'Contains HTML tags'); + + + my $atom = $rss->convert('Atom'); + + # TEST + isa_ok($atom, 'XML::Feed::Format::Atom'); + + my $atom_entry = ($atom->entries)[0]; + + # TEST + isa_ok($atom_entry, 'XML::Feed::Entry::Format::Atom'); + + my $atom_content = $atom_entry->content; + + # TEST + isa_ok($atom_content, 'XML::Feed::Content'); + + # TEST + is($atom_content->type, 'text/html', 'Correct content type'); + + # TEST + like($atom_content->body, qr(<|&lt;), 'Contains HTML tags'); + + # TEST + like($atom_content->body, + qr{\A\Q<p>One of the first concrete outputs from the <a href="http://www.enlightenedperl.org/">Enlightened Perl Organisation</a>\E}, + 'Contains HTML tags'); +} Index: lib/XML/Feed/Format/Atom.pm =================================================================== --- lib/XML/Feed/Format/Atom.pm (revision 153) +++ lib/XML/Feed/Format/Atom.pm (working copy) @@ -8,6 +8,7 @@ use XML::Atom::Util qw( iso2dt ); use List::Util qw( first ); use DateTime::Format::W3CDTF; +use HTML::Entities; use XML::Atom::Entry; XML::Atom::Entry->mk_elem_accessors(qw( lat long ), ['http://www.w3.org/2003/01/geo/wgs84_pos#']); @@ -199,6 +200,10 @@ if (ref($_[0]) eq 'XML::Feed::Content') { if (defined $_[0]->type && defined $types{$_[0]->type}) { %param = (Body => $_[0]->body, Type => $types{$_[0]->type}); + + if ($param{'Type'} eq "html") { + $param{'Body'} = HTML::Entities::encode_entities($param{'Body'}); + } } else { %param = (Body => $_[0]->body); } Index: MANIFEST =================================================================== --- MANIFEST (revision 153) +++ MANIFEST (working copy) @@ -1,12 +1,12 @@ Changes -lib/XML/Feed.pm lib/XML/Feed/Content.pm lib/XML/Feed/Enclosure.pm lib/XML/Feed/Entry.pm lib/XML/Feed/Format/Atom.pm lib/XML/Feed/Format/RSS.pm +lib/XML/Feed.pm +MANIFEST.SKIP MANIFEST This list of files -MANIFEST.SKIP META.yml README t/00-compile.t @@ -23,13 +23,14 @@ t/11-xml-base-atom.t t/11-xml-base-rss.t t/12-multi-categories-atom.t +t/12-multi-categories.base t/12-multi-categories-rss.t -t/12-multi-categories.base t/12-multi-subjects-rss.t t/13-category-hash-bug.t t/14-enclosures.t t/14-multi-enclosures.t t/15-odd-date.t +t/16-convert.t t/pod-coverage.t t/pod.t t/samples/atom-10-example.xml @@ -41,8 +42,7 @@ t/samples/base_atom.xml t/samples/base_rss.xml t/samples/category-bug.xml -t/samples/rss-multiple-categories.xml -t/samples/rss-multiple-subjects.xml +t/samples/rss10-davorg-journal.xml t/samples/rss10-invalid-date.xml t/samples/rss10-odd-date.xml t/samples/rss10.xml @@ -50,3 +50,5 @@ t/samples/rss20-multi-enclosure.xml t/samples/rss20-no-summary.xml t/samples/rss20.xml +t/samples/rss-multiple-categories.xml +t/samples/rss-multiple-subjects.xml
On Sun Jun 21 14:40:34 2009, SHLOMIF wrote: Show quoted text
> Attached is a patch that fixes the problem against the svn trunk. Please > apply it. I reworked Dave's testcase into a patch and fixed the problem.
I suspect that Shlomi's patch may not fix all issues. I've applied it to my local installation and now in some cases I'm seeing double-encoded entities. I'll try to wrap it up into a test file later today. Dave...
On Thu Jun 25 03:56:11 2009, DAVECROSS wrote: Show quoted text
> On Sun Jun 21 14:40:34 2009, SHLOMIF wrote: >
> > Attached is a patch that fixes the problem against the svn trunk. Please > > apply it. I reworked Dave's testcase into a patch and fixed the problem.
> > I suspect that Shlomi's patch may not fix all issues. I've applied it to > my local installation and now in some cases I'm seeing double-encoded > entities. > > I'll try to wrap it up into a test file later today. >
Hi Dave! Any news regarding it? Regards, Shlomi Fish Show quoted text
> Dave...
On Tue Jun 30 07:54:37 2009, SHLOMIF wrote: Show quoted text
> On Thu Jun 25 03:56:11 2009, DAVECROSS wrote:
> > On Sun Jun 21 14:40:34 2009, SHLOMIF wrote: > >
> > > Attached is a patch that fixes the problem against the svn trunk. > > > Please apply it. I reworked Dave's testcase into a patch and > > > fixed the problem.
> > > > I suspect that Shlomi's patch may not fix all issues. I've applied > > it to my local installation and now in some cases I'm seeing double- > > encoded entities. > > > > I'll try to wrap it up into a test file later today.
> > Hi Dave! > > Any news regarding it?
Not yet. Bit busy right now. Might find time this afternoon, otherwise it'll be tomorrow evening. But, in summary, you can have valid feeds where the HTML is already entity-encoded. Your patch makes them double-encoded. Dave...
On Tue Jun 30 08:34:49 2009, DAVECROSS wrote: Show quoted text
> Not yet. Bit busy right now. Might find time this afternoon, otherwise > it'll be tomorrow evening. > > But, in summary, you can have valid feeds where the HTML is already > entity-encoded. Your patch makes them double-encoded.
I'm on holiday at the moment (and will be for 2 more months) but if we get another rain storm I'll have a look at this.
On Tue Jun 30 08:34:49 2009, DAVECROSS wrote: Show quoted text
> On Tue Jun 30 07:54:37 2009, SHLOMIF wrote:
> > On Thu Jun 25 03:56:11 2009, DAVECROSS wrote:
> > > On Sun Jun 21 14:40:34 2009, SHLOMIF wrote: > > >
> > > > Attached is a patch that fixes the problem against the svn trunk. > > > > Please apply it. I reworked Dave's testcase into a patch and > > > > fixed the problem.
> > > > > > I suspect that Shlomi's patch may not fix all issues. I've applied > > > it to my local installation and now in some cases I'm seeing double- > > > encoded entities. > > > > > > I'll try to wrap it up into a test file later today.
> > > > Hi Dave! > > > > Any news regarding it?
> > Not yet. Bit busy right now. Might find time this afternoon, otherwise > it'll be tomorrow evening. >
Hi Dave! Please send me your test case. Regards, -- Shlomi Fish Show quoted text
> But, in summary, you can have valid feeds where the HTML is already > entity-encoded. Your patch makes them double-encoded. > > Dave... > >
On Tue Aug 18 17:40:59 2009, SHLOMIF wrote: Show quoted text
> On Tue Jun 30 08:34:49 2009, DAVECROSS wrote:
> > On Tue Jun 30 07:54:37 2009, SHLOMIF wrote:
> > > On Thu Jun 25 03:56:11 2009, DAVECROSS wrote:
> > > > On Sun Jun 21 14:40:34 2009, SHLOMIF wrote: > > > >
> > > > > Attached is a patch that fixes the problem against the svn > > > > > trunk. Please apply it. I reworked Dave's testcase into a > > > > > patch and fixed the problem.
> > > > > > > > I suspect that Shlomi's patch may not fix all issues. I've > > > > applied it to my local installation and now in some cases I'm > > > > seeing double-encoded entities. > > > > > > > > I'll try to wrap it up into a test file later today.
> > > > > > Hi Dave! > > > > > > Any news regarding it?
> > > > Not yet. Bit busy right now. Might find time this afternoon, > > otherwise it'll be tomorrow evening.
> > Hi Dave! > > Please send me your test case.
Sorry for tardiness. August seems to have vanished in a haze of YAPC::EU followed by a two-week holiday with barely any net access. I remember establishing that the problem wasn't what I thought it was, but I didn't get to working out a decent test case. I hope to have time to get back to it over the weekend. But if you want an example of the problem, try taking Dave Cantrell's RSS feed (http://www.cantrell.org.uk/david/journal/index.pl?format=rss) and converting it into an Atom feed. Dave...
RT-Send-CC: simonw [...] cpan.org
Hi Simon! Did you return from your vacation already? If so, can you please take a look? This bug is blocking a new release of XML-Grammar-Fortune-Synd. Regards, -- Shlomi Fish
On Tue Aug 18 17:40:59 2009, SHLOMIF wrote: Show quoted text
> > Hi Dave! > > Please send me your test case.
Right, here's a test case for the problem I discovered months ago. Sorry it's take me so long to do this. Basically, if you convert the attached RSS 2.0 feed to Atom, then you end up with double-encoded HTML entities when you produce the XML output. This leads to much brokenness when displaying it on a web page. See, for example, http://mps.theplanetarium.org/test/. And I suspect it's Shlomi's previous patch being overzealous. Dave...
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"> <channel> <title>David Jones, MP</title> <link>http://davidjonesblog.com</link> <description>Conservative Member of Parliament for Clwyd West; Shadow Minister for Wales</description> <lastBuildDate>Sat, 10 Oct 2009 14:44:32 +0000</lastBuildDate> <generator>http://wordpress.com/</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <cloud domain="davidjonesblog.com" port="80" path="/?rsscloud=notify" registerProcedure="" protocol="http-post" /> <image> <url>http://www.gravatar.com/blavatar/a828c934fa71b769302493ee214f134d?s=96&amp;d=http://s.wordpress.com/i/buttonw-com.png</url> <title>David Jones, MP</title> <link>http://davidjonesblog.com</link> </image> <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/wordpress/GWIK" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item> <title>Sunny Gordon</title> <link>http://feedproxy.google.com/~r/wordpress/GWIK/~3/YVku-jKELKs/</link> <comments>http://davidjonesblog.com/2009/10/10/sunny-gordon/#comments</comments> <pubDate>Sat, 10 Oct 2009 11:57:47 +0000</pubDate> <dc:creator>David Jones</dc:creator> <category><![CDATA[Conservative Party]]></category> <category><![CDATA[Gordon Brown]]></category> <category><![CDATA[Labour Party]]></category> <category><![CDATA[economy]]></category> <category><![CDATA[Politics]]></category> <guid isPermaLink="false">http://davidjonesblog.com/?p=4022</guid> <description><![CDATA[With Gordon at the helm, Labour are going to let the good times roll.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=davidjonesblog.com&blog=5996455&post=4022&subd=davidjonesmp&ref=&feed=1" />]]></description> <content:encoded><![CDATA[<div class='snap_preview'><br /><p style="text-align:justify;"><span style="color:#000000;">In an interesting, not to say quixotic, attempt at repositioning, Gordon Brown, in an interview  in the <em><a href="http://www.telegraph.co.uk/news/newstopics/politics/gordon-brown/6286070/Britains-economy-ready-to-bounce-back-says-Gordon-Brown.html">Telegraph</a></em> this morning, seeks to portray himself as a sunny optimist, in contrast to the doom-and-gloom mongers of the Conservative party.</span></p> <p style="text-align:justify;"><span style="color:#000000;">It is “simply not true”, says Mr Brown, that tough economic times lie ahead.  No, says the PM, his drive for economic growth will pull the country out of recession; with Gordon at the helm, Labour are going to let the good times roll.</span></p> <p style="text-align:justify;"><span style="color:#000000;">Gratifying as it is to see this hitherto unsuspected Louis Armstrong side to the Prime Minister’s personality, it is unlikely that his new line will cut much ice with an informed electorate.  The Treasury’s own figures indicate that:</span></p> <ul style="text-align:justify;"> <li><span style="color:#000000;">the social security bill will mount to almost £200 billion in four years’ time – almost twice the NHS budget;</span></li> <li><span style="color:#000000;">debt interest will rise to £63 billion per annum;</span></li> <li><span style="color:#000000;">the total cost of welfare and debt maintenance will amount to one-third of government expenditure.</span></li> </ul> <p style="text-align:justify;"><span style="color:#000000;">In the circumstances, it’s rather hard to see that the Tories are being anything other than totally realistic when they warn of hard years to come.   Giving a cheery whistle, as Gordon appears to be advising, isn’t really going to help an awful lot.</span></p> Posted in Conservative Party, economy, Gordon Brown, Labour Party Tagged: Politics <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/davidjonesmp.wordpress.com/4022/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/davidjonesmp.wordpress.com/4022/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/davidjonesmp.wordpress.com/4022/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/davidjonesmp.wordpress.com/4022/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/davidjonesmp.wordpress.com/4022/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/davidjonesmp.wordpress.com/4022/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/davidjonesmp.wordpress.com/4022/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/davidjonesmp.wordpress.com/4022/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/davidjonesmp.wordpress.com/4022/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/davidjonesmp.wordpress.com/4022/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=davidjonesblog.com&blog=5996455&post=4022&subd=davidjonesmp&ref=&feed=1" /></div>]]></content:encoded> <wfw:commentRss>http://davidjonesblog.com/2009/10/10/sunny-gordon/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <media:content url="http://0.gravatar.com/avatar/aea1b4a642b604d87d34047223aab73b?s=96&amp;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&amp;r=G" medium="image"> <media:title type="html">David Jones MP</media:title> </media:content> <feedburner:origLink>http://davidjonesblog.com/2009/10/10/sunny-gordon/</feedburner:origLink></item> </channel> </rss>
use strict; use warnings; use Test::More tests => 14; use XML::Feed; use File::Spec; { my $rss = XML::Feed->parse( File::Spec->catfile(File::Spec->curdir(), "t", "samples", "rss20-david-jones.xml" ) ); # TEST isa_ok($rss, 'XML::Feed::Format::RSS'); my $rss_entry = ($rss->entries)[0]; # TEST isa_ok($rss_entry, 'XML::Feed::Entry::Format::RSS'); my $rss_content = $rss_entry->content; # TEST isa_ok($rss_content, 'XML::Feed::Content'); # TEST is($rss_content->type, 'text/html', 'Correct content type'); # TEST like($rss_content->body, qr(<|&lt;), 'Contains HTML tags'); # TEST like($rss_content->body, qr{\A\Q<div class='snap_preview'><br /><p style="text-align:justify;"><span style="color:#000000;">In an interesting, not to say quixotic}, 'Contains HTML tags'); unlike($rss->as_xml, qr{&amp;lt;}, 'No double encoding'); my $atom = $rss->convert('Atom'); # TEST isa_ok($atom, 'XML::Feed::Format::Atom'); my $atom_entry = ($atom->entries)[0]; # TEST isa_ok($atom_entry, 'XML::Feed::Entry::Format::Atom'); my $atom_content = $atom_entry->content; # TEST isa_ok($atom_content, 'XML::Feed::Content'); # TEST is($atom_content->type, 'text/html', 'Correct content type'); # TEST like($atom_content->body, qr(<|&lt;), 'Contains HTML tags'); # TEST like($atom_content->body, qr{\A\Q&lt;div class=&#39;snap_preview&#39;&gt;&lt;br /&gt;&lt;p style=&quot;text-align:justify;&quot;&gt;&lt;span style=&quot;color:#000000;&quot;&gt;In an interesting, not to say quixotic}, 'Contains HTML tags'); unlike($atom->as_xml, qr{&amp;lt;}, 'No double encoding'); }
RT-Send-CC: simonw [...] cpan.org
Hi Dave!

Sorry it took me so long to respond, but I wasn't notified of your comment (god damn rt.cpan.org).

On Sat Oct 10 13:15:19 2009, DAVECROSS wrote:
Show quoted text
> On Tue Aug 18 17:40:59 2009, SHLOMIF wrote:
> >
> > Hi Dave!
> >
> > Please send me your test case.
>
> Right, here's a test case for the problem I discovered months ago. Sorry
> it's take me so long to do this.
>
> Basically, if you convert the attached RSS 2.0 feed to Atom, then you
> end up with double-encoded HTML entities when you produce the XML output.
>

Thanks! I see it now.

The attached patch fixes this problem (along with an extra test). It was caused by a strange XML::Atom behaviour. Please let me know if it works for you.

Regards,

-- Shlomi Fish

Show quoted text
> This leads to much brokenness when displaying it on a web page. See, for
> example, http://mps.theplanetarium.org/test/.
>
> And I suspect it's Shlomi's previous patch being overzealous.
>
> Dave...


Subject: XML-Feed-svn-3.diff

Message body is not shown because it is too large.

Just to let the both of you know - neither patch is working for me. I'm digging into why as we speak.

On Thu Jan 07 19:48:21 2010, SIMONW wrote: Show quoted text
> Just to let the both of you know - neither patch is working for me. > I'm digging > into why as we speak.
How does the latest patch not work? What happens? Where have you tried that? What happens exactly? How did you apply it? Regards, -- Shlomi Fish
On Sun Jan 24 06:36:21 2010, SHLOMIF wrote: Show quoted text
> On Thu Jan 07 19:48:21 2010, SIMONW wrote:
> > Just to let the both of you know - neither patch is working for me. > > I'm digging > > into why as we speak.
> > How does the latest patch not work? What happens? Where have you tried > that? What happens exactly? How did you apply it? > > Regards, > > -- Shlomi Fish
I've said that over two months back. Meanwhile, this bug is not fixed, while the patch works perfectly fine for Dave and me. If you don't have time to work on this module, please give me (SHLOMIF) co-maintainership. Regards, -- Shlomi Fish
RT-Send-CC: dave [...] dave.org.uk
On Sat Mar 20 07:36:54 2010, SHLOMIF wrote: Show quoted text
> I've said that over two months back. Meanwhile, this bug is not fixed, > while the patch works perfectly fine for Dave and me. If you don't have > time to work on this module, please give me (SHLOMIF) > co-maintainership.
Hmm, I could have sworn I'd patched and pushed the changes ages ago. Please check SVN to make sure the current patch works for you - I was having problems with the previous patch whereby stuff would come out in odd encodings and I couldn't track down why for ages. This version seems to work with all the test files sent so far. Simon
RT-Send-CC: simonw [...] cpan.org
Hi Simon, Thanks for finally replying. On Mon Mar 22 17:07:41 2010, SIMONW wrote: Show quoted text
> On Sat Mar 20 07:36:54 2010, SHLOMIF wrote: >
> > I've said that over two months back. Meanwhile, this bug is not fixed, > > while the patch works perfectly fine for Dave and me. If you don't have > > time to work on this module, please give me (SHLOMIF) > > co-maintainership.
> > Hmm, I could have sworn I'd patched and pushed the changes ages ago. >
Well, you didn't. Show quoted text
> Please check SVN to make sure the current patch works for you - I was > having problems with the previous patch whereby stuff would come out in > odd encodings and I couldn't track down why for ages. This version seems > to work with all the test files sent so far. >
16-convert.t that is attached to this message still fails and was part of my patch: {{{ shlomi:~/progs/perl/cpan/XML/Feed/trunk$ perl -Ilib t/16-convert.t 1..17 ok 1 - The object isa XML::Feed::Format::RSS ok 2 - The object isa XML::Feed::Entry::Format::RSS ok 3 - The object isa XML::Feed::Content ok 4 - Correct content type ok 5 - Contains HTML tags ok 6 - Contains HTML tags ok 7 - The object isa XML::Feed::Format::Atom ok 8 - The object isa XML::Feed::Entry::Format::Atom ok 9 - The object isa XML::Feed::Content ok 10 - Correct content type ok 11 - Contains HTML tags ok 12 - Contains HTML tags ok 13 - The object isa XML::Feed::Format::RSS ok 14 - The object isa XML::Feed::Entry::Format::RSS ok 15 - Correct content type - No. 2 ok 16 - Found content type=html in Atom not ok 17 - Atom content Followed by non-double encoded HTML. # Failed test 'Atom content Followed by non-double encoded HTML.' # at t/16-convert.t line 97. # '&amp;lt;div class=&amp;#39;snap_preview&amp;#39;&amp;gt;&amp;lt;br /&amp;gt;&amp;lt;p style=&amp;quot;text-align:justify;&amp;qu' # doesn't match '(?-xism:\A&lt;div class=(?:&#39;|')snap_preview(?:&#39;|'))' # Looks like you failed 1 test of 17. }}} I also notice you didn't include regressions tests in the recent Subversion commits. Why? Regards, -- Shlomi Fish Show quoted text
> Simon
Subject: 16-convert.t
use strict; use warnings; use Test::More tests => 17; use XML::Feed; use File::Spec; { my $rss = XML::Feed->parse( File::Spec->catfile(File::Spec->curdir(), "t", "samples", "rss10-davorg-journal.xml" ) ); # TEST isa_ok($rss, 'XML::Feed::Format::RSS'); my $rss_entry = ($rss->entries)[0]; # TEST isa_ok($rss_entry, 'XML::Feed::Entry::Format::RSS'); my $rss_content = $rss_entry->content; # TEST isa_ok($rss_content, 'XML::Feed::Content'); # TEST is($rss_content->type, 'text/html', 'Correct content type'); # TEST like($rss_content->body, qr(<|&lt;), 'Contains HTML tags'); # TEST like($rss_content->body, qr{\A\Q<p>One of the first concrete outputs from the <a href="http://www.enlightenedperl.org/">Enlightened Perl Organisation</a>\E}, 'Contains HTML tags'); my $atom = $rss->convert('Atom'); # TEST isa_ok($atom, 'XML::Feed::Format::Atom'); my $atom_entry = ($atom->entries)[0]; # TEST isa_ok($atom_entry, 'XML::Feed::Entry::Format::Atom'); my $atom_content = $atom_entry->content; # TEST isa_ok($atom_content, 'XML::Feed::Content'); # TEST is($atom_content->type, 'text/html', 'Correct content type'); # TEST like($atom_content->body, qr(<|&lt;), 'Contains HTML tags'); # TEST like($atom_content->body, qr{\A\Q<p>One of the first concrete outputs from the <a href="http://www.enlightenedperl.org/">Enlightened Perl Organisation</a>\E}, 'Contains HTML tags'); } { my $rss = XML::Feed->parse( File::Spec->catfile(File::Spec->curdir(), "t", "samples", "rss20-david-jones.xml", ) ); # TEST isa_ok($rss, 'XML::Feed::Format::RSS'); my $rss_entry = ($rss->entries)[0]; # TEST isa_ok($rss_entry, 'XML::Feed::Entry::Format::RSS'); my $atom = $rss->convert('Atom'); # TEST is(($atom->entries)[0]->content->type, 'text/html', 'Correct content type - No. 2'); my $atom_text = $atom->as_xml(); # TEST ok (scalar($atom_text =~ m{<content type="html">}g), "Found content type=html in Atom", ); # TEST like ( substr($atom_text, pos($atom_text), 128), qr{\A&lt;div class=(?:&#39;|')snap_preview(?:&#39;|')}, "Atom content Followed by non-double encoded HTML." ); }
On Mon Mar 22 17:07:41 2010, SIMONW wrote: Show quoted text
> On Sat Mar 20 07:36:54 2010, SHLOMIF wrote: >
> > I've said that over two months back. Meanwhile, this bug is not > > fixed, while the patch works perfectly fine for Dave and me. If you > > don't have time to work on this module, please give me (SHLOMIF) > > co-maintainership.
> > Hmm, I could have sworn I'd patched and pushed the changes ages ago. > > Please check SVN to make sure the current patch works for you - I was > having problems with the previous patch whereby stuff would come out > in odd encodings and I couldn't track down why for ages. This version > seems to work with all the test files sent so far.
Simon, This has just blown up again: http://www.illusori.co.uk/perl/2010/04/25/mangled_ironman_feed.html http://perlhacks.com/2010/04/ironman-and-xmlfeed.php I've checked out the latest from SVN (r160) and there still seem to be problems. * Test t/16-convert.t is in MANIFEST but not in svn. There's a 16-convert-mult-categories.t but that seems to be a completely different test. * My two new data files (t/samples/rss10-davorg-journal.xml and t/samples/rss20-david-jones.xml) are missing. * Having added the missing files, I'm getting test failures in t/16-convert.t. not ok 17 - Atom content Followed by non-double encoded HTML. # Failed test 'Atom content Followed by non-double encoded HTML.' # at t/16-convert.t line 97. # '&amp;lt;div class=&amp;#39;snap_preview&amp;#39;&amp;gt;&amp;lt;br /&amp;gt;&amp;lt;p style=&amp;quot;text-align:justify;&amp;qu' # doesn't match '(?-xism:\A&lt;div class=(?:&#39;|')snap_preview(?:&#39;|'))' # Looks like you failed 1 test of 17. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/17 subtests Hope that's useful. Let me know if I can be any help. Cheers, Dave...
I'm pretty sure this is all finally fixed in version 0.45. Dave...