Skip Menu |

This queue is for tickets about the CGI-RSS CPAN distribution.

Report information
The Basics
Id: 71851
Status: resolved
Worked: 5 min
Priority: 0/
Queue: CGI-RSS

People
Owner: jettero [...] cpan.org
Requestors: dynot [...] JUNKMAIL.ATH.CX
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.9600
Fixed in: (no value)



Subject: Does not do rfc822 date processing
You use a wrong format to generate an rfc822 date. Instead of %Z you should use %z. `man 3 strftime` describes the correct format: RFC 2822-compliant date format (with an English locale for %a and %b) "%a, %d %b %Y %T %z" RFC 822-compliant date format (with an English locale for %a and %b) "%a, %d %b %y %T %z" With the capital Z the feed is invalid at http://validator.w3.org/feed/ Additionally, I dont like that all dates are converted to the local timezone. Why is this happening? I think it should only parse the date if it is not already in rfc822 format. See attached script for fix and testing.
Subject: cgi-rss.pl
#!/usr/bin/perl -w use strict; use CGI::RSS; use POSIX qw(strftime); # run script then enter a few dates to test or remove line 102 to see rss output { package CGI::RSS; sub valid_rfc822_date ($) { $_[0] =~ m!^ (?: (?: Mon | Tue | Wed | Thu | Fri | Sat | Sun ) # day ,\s\s? # comma, space or two )? # (these were optional) \d\d?\s # day with 1 or 2 digit, space (?: Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec ) # month \s\d{2,4}\s # space, 2 or 4 digit year, space \d\d:\d\d:\d\d\s # hr:min:sec, space (?: [\+\-]\d\d\d\d | # time zone with digits, or UT | GMT | EST | EDT | CST | CDT | MST | MDT | PST | PDT | Z | A | M | N | Y # time zone with characters )$ !ix; } sub date { my $this = shift; my $date = shift; if( ! valid_rfc822_date($date) and my $pd = &ParseDate($date) ) { warn "parsing date: $date\n"; my $rfc822_date = &UnixDate($pd, '%a, %d %b %Y %H:%M:%S %z'); return $this->pubDate($rfc822_date); } $this->pubDate($date); } } my $rss = new CGI::RSS; my @feed = ( { title => "first item", link => "http://localhost/directory/1", guid => "http://localhost/directory/1", desc => "this is the first item", date => "Sun, 16 Oct 2011 06:45:03 +0900" }, { title => "second item", link => "http://localhost/directory/2", guid => "http://localhost/directory/2", desc => "this is the second item", date => "15 Oct 2011 12:15:06 +0200" }, { title => "third item", link => "http://localhost/directory/3", guid => "http://localhost/directory/3", desc => "this is the third item", date => "2011-08-11 07:02:26" }, { title => "fourth item", link => "http://localhost/directory/4", guid => "http://localhost/directory/4", desc => "this is the fourth item", date => "Sat, 15 Oct 2011 09:31:01 +0800 (CST)" }, ); print $rss->header; print $rss->begin_rss( title => "My Feed!", link => "http://localhost/directory", desc => "My feed is cool!" ); #print $rss->date($_) while (<>); exit; foreach my $h ( @feed ) { print $rss->item( $rss->title ( $h->{title} ), $rss->link ( $h->{link} ), $rss->guid ( $h->{link} ), # unique identifier, usually $rss->description ( $h->{desc} ), # a permalink $rss->date ( $h->{date} ), # does rfc822 date processing ); } print $rss->finish_rss;
On Sat Oct 22 13:13:04 2011, dynot wrote: Show quoted text
> You use a wrong format to generate an rfc822 date. Instead of %Z you > should use %z. `man 3 strftime` describes the correct format:
That may be the case, and I'm willing to change it, but I have a test in t/ that feeds the rss to the w3 validator and it checks out. The only recommendation that it gives is to add atom:link with rel="self". I went ahead and tried it though (https://github.com/jettero/cgi--rss/tree/_proposed_fix). If I used your proposed format then w3 recommends I use rfc822: "Problematical RFC 822 date-time value: Sat, 22 Mar 08 00:00:00 -0400" If you check the RFC under §5 (http://www.w3.org/Protocols/rfc822/), I think you'll find that RFC 822 is indeed %a, %d %b %Y %H:%M:%S %Z. I'm not at all opposed to addding options to disable the format check. I have also made a change (github only) to allow you to change the format to whatever you like with $CGI::RSS::RFC822F = "%z". Will this help? -Paul -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
On Sun Oct 23 10:23:54 2011, JETTERO wrote: Show quoted text
> On Sat Oct 22 13:13:04 2011, dynot wrote:
> > You use a wrong format to generate an rfc822 date. Instead of %Z
you Show quoted text
> > should use %z. `man 3 strftime` describes the correct format:
> > > That may be the case, and I'm willing to change it, but I have a test
in Show quoted text
> t/ that feeds the rss to the w3 validator and it checks out. The only > recommendation that it gives is to add atom:link with rel="self".
That's interesting. Without any modification, your module produced date formats such as: Sat, 15 Oct 2011 03:31:01 CEST the problem is that CEST is not standards compilant according to the w3 validator! pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST By leaving the year in 4 digits with capital %Y, and changing %z to lowercase, the feed validates. (Sat, 15 Oct 2011 23:45:03 +0200) Show quoted text
> > I went ahead and tried it though > (https://github.com/jettero/cgi--rss/tree/_proposed_fix). > > If I used your proposed format then w3 recommends I use rfc822: > > "Problematical RFC 822 date-time value: Sat, 22 Mar 08 00:00:00 -0400" >
That's funny, cause rfc822 recommends 2 digits for the year, so this format is indeed rfc822 compliant! date = 1*2DIGIT month 2DIGIT ; day month year ; e.g. 20 Jun 82 As far as I can tell, the w3 validator checks the date for rfc2822 (4digit year), and not rfc822 (2digit year). Show quoted text
> If you check the RFC under §5 (http://www.w3.org/Protocols/rfc822/), I > think you'll find that RFC 822 is indeed %a, %d %b %Y %H:%M:%S %Z. >
Yes I read that and CEST is not among the options for time zones, only with 1 2 or 3 characters or +/- 4 digits . Show quoted text
> I'm not at all opposed to addding options to disable the format check.
Good to hear, thanks. Show quoted text
> I have also made a change (github only) to allow you to change the > format to whatever you like with $CGI::RSS::RFC822F = "%z". Will
this help? Nice to have options but I'm not interested in customizing the date format; what I would like is to pass RSS feed validation. Show quoted text
> > -Paul >
Thank you, Peter
From: dynot [...] JUNKMAIL.ATH.CX
Oh, found it on the validator website: The value specified must meet the Date and Time specifications as defined by RFC822, with the exception that the year should be expressed as four digits. But RFC822 date with 4 digits is RFC2822 isn't it? :) RFC 2822-compliant date format (with an English locale for %a and %b) "%a, %d %b %Y %T %z" RFC 822-compliant date format (with an English locale for %a and %b) "%a, %d %b %y %T %z"
Show quoted text
> Sat, 15 Oct 2011 03:31:01 CEST > pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST
These dates are the same. And I agree that they are correct. I may not understand the problem. The date format I'm using passes the validator and the one you proposed does not. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
I see the problem now. Yes, the current RSS best practice is to use the 4-digit year, despite the fact that it's not RFC822. The w3 validator recognizes only the best practice with a bad explanation in the error. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
On Tue Oct 25 14:41:50 2011, JETTERO wrote: Show quoted text
> > Sat, 15 Oct 2011 03:31:01 CEST > > pubDate must be an RFC-822 date-time: Sat, 15 Oct 2011 23:45:03 CEST
> > These dates are the same. And I agree that they are correct. > > I may not understand the problem. The date format I'm using passes the > validator and the one you proposed does not. >
I'm sorry but I have no idea what you're talking about. I guess your version passes validation because you only tried one date, 2008-03-22, which is transformed to Sat, 22 Mar 2008 00:00:00 CET and this is valid. However I try some other dates like Sun, 16 Oct 2011 06:45:03 +0900 which become Sat, 15 Oct 2011 23:45:03 CEST and this DOES NOT validate. I am in timezone GMT+2, this is equivalent to CEST i guess. The format I proposed is changing the capital %Z to lower %z. How does this break validation? I get: 2008-03-22 => Sat, 22 Mar 2008 00:00:00 +0100 Sun, 16 Oct 2011 06:45:03 +0900 => Sat, 15 Oct 2011 23:45:03 +0200 which are all valid.
Show quoted text
> which are all valid.
I cut and pasted your date format into a temporary branch at github and linked to it. It did not pass validation. This whole %z vs %Z discussion is silly, since they're actually both valid. In any case, I'm going to release a version where you can set it to anything you like, I don't particularly care what you use, but the format I have selected is textbook, correct, and passes validation. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
On Tue Oct 25 17:07:08 2011, JETTERO wrote: Show quoted text
> > which are all valid.
> > I cut and pasted your date format into a temporary branch at github and > linked to it. It did not pass validation. This whole %z vs %Z > discussion is silly, since they're actually both valid. > > In any case, I'm going to release a version where you can set it to > anything you like, I don't particularly care what you use, but the > format I have selected is textbook, correct, and passes validation. >
Ok, thanks. This customizable date format will be a good solution. I have no idea why I get this invalid CEST timezone, or why we get different validation results.
On Tue Oct 25 17:15:00 2011, dynot wrote: Show quoted text
> Ok, thanks. This customizable date format will be a good solution.
I'm still trying to make up my mind... ->new(date_format=>"$blarg") or should I just leave it as a lexical namespace var: $CGI::RSS::DATE_FORMAT = "$blarg"; I'll probably release today when I make up my mind. Show quoted text
> I have no idea why I get this invalid CEST timezone, or why we get > different validation results.
I have some idea. Perl is getting that timzezone from your operating system. Shell out and issue "date +%z/%Z" and I bet you see that CEST there too. Probably a bad localization? All my googles show that CEST is just Central European summer time. Maybe w3 just has a really out of date tzinfo database? Is it new? -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
On Wed Oct 26 07:23:10 2011, JETTERO wrote: Show quoted text
> > I have no idea why I get this invalid CEST timezone, or why we get > > different validation results.
> > I have some idea. Perl is getting that timzezone from your operating > system. Shell out and issue "date +%z/%Z" and I bet you see that CEST > there too. Probably a bad localization? All my googles show that
CEST Show quoted text
> is just Central European summer time. Maybe w3 just has a really out
of Show quoted text
> date tzinfo database? Is it new? > > >
I have three debian systems, two with GMT+2 timezone: $ date +%z +0200 $ date +%Z CEST and one with GMT timezone: $ date +%z +0000 $ date +%Z UTC Neither CEST / UTC validate as RFC822 date! zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM) Maybe debian specific? I can test tomorrow on SUSE linux.
oic, us centric. I read that about 500 times the last two days and never noticed that it's only 5 timezones. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
It's not debian specific, tried on SUSE linux and got the same results: $ date +%z +0200 $ date +%Z CEST
Right, no, I said a post ago that the real problem is my lack of understanding that the standard only accepts US timezones. I had no idea about that. I just assumed the were examples and it would take any timezone. Basically... I'll be changing to %z today ... -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.
From: dynot [...] JUNKMAIL.ATH.CX
ok thanks for your help.
Forgot to close this I guess. Please only respond if it's still broken. -- If riding in an airplane is flying, then riding in a boat is swimming. 116 jumps, 48.6 minutes of freefall, 92.9 freefall miles.