Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 33001
Status: resolved
Worked: 30 min
Priority: 0/
Queue: XML-RSS

People
Owner: SHLOMIF [...] cpan.org
Requestors: GWOLF [...] cpan.org
Cc: 463774 [...] bugs.debian.org
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 1.30
  • 1.31
Fixed in: (no value)



CC: 463774 [...] bugs.debian.org
Subject: does not implement RSS 2.0 guid isPermaLink properly; hides guids
Hi, I'm copying over this bug report we received in the Debian bug tracker: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=463774 This bug seems to have first appeared in version 1.30. Consider a feed such as the music I listen to: http://ws.audioscrobbler.com/1.0/user/joeyhess/recenttracks.rss <item> <title>Foo Fighters – Come Alive</title> <link>http://www.last.fm/music/Foo+Fighters/_/Come+Alive</link> <pubDate>Sun, 27 Jan 2008 05:07:00 +0000</pubDate> <guid>http://www.last.fm/user/joeyhess/#1201410420</guid> <description>http://www.last.fm/music/Foo+Fighters</description> </item> If I parse this using XML::RSS, this happens: http://www.last.fm/music/Foo+Fighters/_/Come+Alive joey@kodama:~>perl -le 'use XML::RSS; local $/=undef; $feed=<>; $r=XML::RSS->new(version => "1.0"); $r->parse($feed); print "link: ".$r->{items}->[0]->{link}; print "guid: ".$r->{items}->[0]->{guid}' < recenttracks.rss link: http://www.last.fm/music/Foo+Fighters/_/Come+Alive guid: In this feed, the link links to the song. Which I might play multiple times. Thus the guid, which differs for each play. Since I get back the same link each time, and can't look at the guid, there's no way to distinguish one play of the song from another. Here's the culprit: # guid element is a permanent link unless isPermaLink attribute # is set to false } elsif ($el eq 'guid') { $self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} = !(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'} eq 'false')); # beginning of taxo li element in item element #'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo' } This is just wrong. The RSS 2.0 spec says: If the guid element has an attribute named "isPermaLink" with a value of true, the reader may assume that it is a permalink to the item The above code is exactly backwards to the spec, assuming that the guid is a permalink unless isPermaLink=false. The guid doesn't even have to be an url according to the spec, so this is very wrong. It can be fixed as follows. (I threw in an lc too, because attributes should (probably) be parsed case-insensatively.) Note that I had to patch the test suite, since this does change behavior -- the test suite was testing for the same incorrect reading of the spec. Index: t/2.0-permalink.t =================================================================== --- t/2.0-permalink.t (revision 13998) +++ t/2.0-permalink.t (working copy) @@ -21,9 +21,8 @@ ); # TEST -is ($item_with_guid_missing->{"permaLink"}, - "http://community.livejournal.com/lj_dev/713810.html", - "guid's isPermaLink is missing, so the item permalink property should be set to the value of the guid tag" +ok ((!$item_with_guid_missing->{"permaLink"}), + "guid's isPermaLink is missing (implicitly false), so the item permalink property should not be set" ); # TEST Index: lib/XML/RSS.pm =================================================================== --- lib/XML/RSS.pm (revision 13998) +++ lib/XML/RSS.pm (working copy) @@ -786,11 +786,12 @@ } } - # guid element is a permanent link unless isPermaLink attribute is set to false + # guid element is a permanent link IFF isPermaLink attribute is set + # to true } elsif ($el eq 'guid') { $self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} = - !(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'} eq 'false')); + (exists($attribs{'isPermaLink'}) && (lc($attribs{'isPermaLink'}) eq 'true')); # beginning of taxo li element in item element #'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo' -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.24-1-686 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libxml-feed-perl depends on: ii libclass-errorhandler-perl 0.01-2 Base class for error handling ii libdatetime-format-mail-pe 0.3001-1 Convert between DateTime and RFC28 ii libdatetime-format-w3cdtf- 0.04-2 Parse and format W3CDTF datetime s ii libdatetime-perl 2:0.41-1 perl DateTime - Reference implemen ii libfeed-find-perl 0.06-2 Syndication feed auto-discovery ii libhtml-parser-perl 3.56-1 A collection of modules that parse ii liburi-fetch-perl 0.08-1 Smart URI fetching/caching ii liburi-perl 1.35.dfsg.1-1 Manipulates and accesses URI strin ii libwww-perl 5.808-1 WWW client/server library for Perl ii libxml-atom-perl 0.25-2 Atom feed and API implementation ii libxml-rss-perl 1.31-3 Perl module for managing RSS (RDF ii perl 5.8.8-12 Larry Wall's Practical Extraction libxml-feed-perl recommends no packages. -- no debconf information -- see shy jo
Thanks for the bug report and the patch. This is fixed in trunk (r10702) and I will upload an XML-RSS-1.32 to the CPAN shortly. Thanks again. Regards, Shlomi Fish