Skip Menu |

This queue is for tickets about the XML-Feed CPAN distribution.

Report information
The Basics
Id: 57730
Status: resolved
Priority: 0/
Queue: XML-Feed

People
Owner: Nobody in particular
Requestors: david [...] kineticode.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: simonw [...] cpan.org
Subject: Re: XML::Feed Date Parsing
Date: Fri, 21 May 2010 15:07:00 -0400
To: bug-xml-feed [...] rt.cpan.org
From: David E. Wheeler <david [...] kineticode.com>
Here's a patch. I extracted some shitty dates from some feeds I'm parsing, plus threw in a bunch of others. The test ensures that they all work in pubDate, dc:date, dcterms:date dcterms:modified, and atom:updated. It adds dependencies on DateTime::Format::ISO8601, DateTime::Format::Flexible, and DateTime::Format::Natural. I didn't add a parameter, as there don't seem to be any real attributes to use. Maybe I've missed something? I've Cc'd RT so that it doesn't get lost in the shuffle. What do you think? Best, David On May 20, 2010, at 4:05 PM, David E. Wheeler wrote: Show quoted text
> Hi Simon, > > I'm using XML::Feed for a project. It's so nice not to have to worry about all the variations in feeds. Many thanks to you and SixApart for the great module. > > One place where I do have to worry, though, is with dates. There are a lot of feeds out there with invalid date formats. Take http://bestwebgallery.com/feed/ for example. It has this: > > <pubDate>May 17, 2010</pubDate> > > Irritating. I fully expect to find a lot more shitty dates. Alas, with a date like this, issued() returns undef. I'd really like to make a best effort to get at dates in all formats, as I could really use it for proper(ish) sorting. > > I noticed this test in t/01-parse.t: > > $feed = XML::Feed->parse('t/samples/rss10-invalid-date.xml') > or die XML::Feed->errstr; > $entry = ($feed->entries)[0]; > ok(!$entry->issued); ## Should return undef, but not die. > ok(!$entry->modified); ## Same. > > So I guess that you want to be strict by default. So What I'm thinking is adding an attribute to XML::Feed to be looser when parsing dates. If it's set to true (false by default), then it would also try DateTime::Format::Natural or perhaps DateTime::Format::Flexible. Would you be interested in such a patch? > > If so, looking at Format::RSS, I see that it first tries {dc}{date} and then {PubDate}. Should I continue with that approach? Or maybe try both strict first, and then try them both again more loosely? > > Thanks, > > David

Message body is not shown because sender requested not to inline it.

Ticket migrated to github as https://github.com/davorg/xml-feed/issues/42