CC: | 463774 [...] bugs.debian.org |
Subject: | does not implement RSS 2.0 guid isPermaLink properly; hides guids |
Hi,
I'm copying over this bug report we received in the Debian bug tracker:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=463774
This bug seems to have first appeared in version 1.30.
Consider a feed such as the music I listen to:
http://ws.audioscrobbler.com/1.0/user/joeyhess/recenttracks.rss
<item>
<title>Foo Fighters – Come Alive</title>
<link>http://www.last.fm/music/Foo+Fighters/_/Come+Alive</link>
<pubDate>Sun, 27 Jan 2008 05:07:00 +0000</pubDate>
<guid>http://www.last.fm/user/joeyhess/#1201410420</guid>
<description>http://www.last.fm/music/Foo+Fighters</description>
</item>
If I parse this using XML::RSS, this happens:
http://www.last.fm/music/Foo+Fighters/_/Come+Alive
joey@kodama:~>perl -le 'use XML::RSS; local $/=undef; $feed=<>;
$r=XML::RSS->new(version => "1.0"); $r->parse($feed);
print "link: ".$r->{items}->[0]->{link};
print "guid: ".$r->{items}->[0]->{guid}' < recenttracks.rss
link: http://www.last.fm/music/Foo+Fighters/_/Come+Alive
guid:
In this feed, the link links to the song. Which I might play multiple
times. Thus the guid, which differs for each play. Since I get back the
same link each time, and can't look at the guid, there's no way to
distinguish one play of the song from another.
Here's the culprit:
# guid element is a permanent link unless isPermaLink attribute
# is set to false
}
elsif ($el eq 'guid') {
$self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} =
!(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'}
eq 'false'));
# beginning of taxo li element in item element
#'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo'
}
This is just wrong. The RSS 2.0 spec says:
If the guid element has an attribute named "isPermaLink" with a value of
true, the reader may assume that it is a permalink to the item
The above code is exactly backwards to the spec, assuming that the guid
is a permalink unless isPermaLink=false. The guid doesn't even have to
be an url according to the spec, so this is very wrong. It can be fixed
as follows. (I threw in an lc too, because attributes should (probably)
be parsed case-insensatively.) Note that I had to patch the test suite,
since this does change behavior -- the test suite was testing for the
same incorrect reading of the spec.
Index: t/2.0-permalink.t
===================================================================
--- t/2.0-permalink.t (revision 13998)
+++ t/2.0-permalink.t (working copy)
@@ -21,9 +21,8 @@
);
# TEST
-is ($item_with_guid_missing->{"permaLink"},
- "http://community.livejournal.com/lj_dev/713810.html",
- "guid's isPermaLink is missing, so the item permalink property
should be set to the value of the guid tag"
+ok ((!$item_with_guid_missing->{"permaLink"}),
+ "guid's isPermaLink is missing (implicitly false), so the item
permalink property should not be set"
);
# TEST
Index: lib/XML/RSS.pm
===================================================================
--- lib/XML/RSS.pm (revision 13998)
+++ lib/XML/RSS.pm (working copy)
@@ -786,11 +786,12 @@
}
}
- # guid element is a permanent link unless isPermaLink attribute
is set to false
+ # guid element is a permanent link IFF isPermaLink attribute is set
+ # to true
}
elsif ($el eq 'guid') {
$self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} =
- !(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'}
eq 'false'));
+ (exists($attribs{'isPermaLink'}) &&
(lc($attribs{'isPermaLink'}) eq 'true'));
# beginning of taxo li element in item element
#'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo'
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1,
'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.24-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages libxml-feed-perl depends on:
ii libclass-errorhandler-perl 0.01-2 Base class for error handling
ii libdatetime-format-mail-pe 0.3001-1 Convert between DateTime
and RFC28
ii libdatetime-format-w3cdtf- 0.04-2 Parse and format W3CDTF
datetime s
ii libdatetime-perl 2:0.41-1 perl DateTime - Reference
implemen
ii libfeed-find-perl 0.06-2 Syndication feed auto-discovery
ii libhtml-parser-perl 3.56-1 A collection of modules
that parse
ii liburi-fetch-perl 0.08-1 Smart URI fetching/caching
ii liburi-perl 1.35.dfsg.1-1 Manipulates and accesses
URI strin
ii libwww-perl 5.808-1 WWW client/server library
for Perl
ii libxml-atom-perl 0.25-2 Atom feed and API
implementation
ii libxml-rss-perl 1.31-3 Perl module for managing
RSS (RDF
ii perl 5.8.8-12 Larry Wall's Practical
Extraction
libxml-feed-perl recommends no packages.
-- no debconf information
--
see shy jo