Subject: | Misbehaviour in XML::RSS::Feed, mixup in Headline id/guid |
Date: | Wed, 14 Oct 2009 01:18:28 +0200 |
To: | bug-XML-RSS-Feed [...] rt.cpan.org |
From: | Sven Knispel <sven.knispel [...] pobox.com> |
Dear Jeff,
after having spent the two last nights frying to find out about a
difference in behavior of XML::RSS::Feed on my pc and on a friend's I
finally had a breakthrough.
To sum it up: on version 2.212 everything is fine, on version 2.32 not
anymore.
Let me elaborate a little on "everything". with the adapted example from
the POD:
use XML::RSS::Feed;
use LWP::Simple qw(get);
my $feed = XML::RSS::Feed->new(
url => "http://feeds.wired.com/wired/index",
name => "Wired",
delay => 10,
debug => 1,
tmpdir => ".",
);
while (1) {
$feed->parse(get($feed->url));
print $_->headline . "\n" for $feed->late_breaking_news;
sleep($feed->delay);
}
Ok, the expected behavior (with V2.212):
- first run: it fetches whatever is in the feed (30 items), and keeps
going in the loop with no new items.
- second run: after having retrieved the cached items there is no
breaking news so it goes on telling "no headlines found".
And now the problem (with 2.32):
- first run: it fetches whatever is in the feed (30 items), and keeps
going in the loop with no new items.
- second run: after having retrieved the cached items it still sees
another 30 breaking news items and shows them again. At every run the
number of initialized headlines from the cache increases by 30.
After a few hours and lots of coffee I broke the problem down to the
Headlines. In the newer 2.32 version of headlines there is the concept
of guid that didn't exist in older version. I found that the "faulty"
code is in Headlines.pm in "sub id" on "return $self->guid ||
$self->url;". For whatever reason $self->guid is not set prior to
caching or read from cache (at least my assumption). Anyway, always
returning the URL solves the misbehavior.
And finally without modifying the code doing a
"$feed->init_headlines_seen;" in the calling program does also as
obviously it replaces the logic for setting/getting Headline id. The
program working for me is:
use XML::RSS::Feed;
use LWP::Simple qw(get);
my $feed = XML::RSS::Feed->new(
url => "http://feeds.wired.com/wired/index",
name => "Wired",
delay => 10,
debug => 1,
headline_as_id => 1, # <-- avoids getting "real" headline it
tmpdir => ".",
);
while (1) {
$feed->parse(get($feed->url));
print $_->headline . "\n" for $feed->late_breaking_news;
sleep($feed->delay);
}
Now I suspect "sub _build_dump_structure" to be candidate to store guid
together with url to solve the problem but I lack background on RSS so
please excuse me if I am completely wrong (it would be nice to read your
opinion on this whole thing ;-) ).
Brgds
Sven