Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 21740
Status: resolved
Worked: 40 min
Priority: 0/
Queue: XML-RSS

People
Owner: SHLOMIF [...] cpan.org
Requestors: nutlet [...] karelia.ru
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.10
Fixed in: (no value)



Subject: wrong handling enclosure subelement of item
According to rss 2.0 specification, 'enclosure' - subelement of 'item' - is an empty xml-element with few attributes. F.e.: <enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" /> XML::RSS looses all attributes of this element. Here is the quick patch to fix this: *** RSS_original.pm 2006-03-12 02:47:19.000000000 +0300 --- RSS.pm 2006-09-27 12:29:41.000000000 +0400 *************** sub handle_start { *** 1505,1510 **** --- 1505,1515 ---- push(@{$self->{'items'}->[$self->{num_items}-1]-> {'taxo'}},$attribs{'resource'}); $self->{'modules'}-> {'http://purl.org/rss/1.0/modules/taxonomy/'} = 'taxo'; + # beginning of enclosure element in item + } elsif ($el eq 'enclosure' && $self->within_element('item')) { + + $self->{'items'}->[$self->{num_items}-1]->{'enclosure'} = {map {$_ => $attribs{$_}} keys %attribs}; + # beginning of taxo li in channel element } elsif ($self->within_element($self->generate_ns_name ("topics",'http://purl.org/rss/1.0/modules/taxonomy/')) && $self->within_element($self->generate_ns_name ("channel",$self->{namespace_map}->{'rss10'}))
Subject: RSS.pm.patch
*** RSS_original.pm 2006-03-12 02:47:19.000000000 +0300 --- RSS.pm 2006-09-27 12:29:41.000000000 +0400 *************** sub handle_start { *** 1505,1510 **** --- 1505,1515 ---- push(@{$self->{'items'}->[$self->{num_items}-1]->{'taxo'}},$attribs{'resource'}); $self->{'modules'}->{'http://purl.org/rss/1.0/modules/taxonomy/'} = 'taxo'; + # beginning of enclosure element in item + } elsif ($el eq 'enclosure' && $self->within_element('item')) { + + $self->{'items'}->[$self->{num_items}-1]->{'enclosure'} = {map {$_ => $attribs{$_}} keys %attribs}; + # beginning of taxo li in channel element } elsif ($self->within_element($self->generate_ns_name("topics",'http://purl.org/rss/1.0/modules/taxonomy/')) && $self->within_element($self->generate_ns_name("channel",$self->{namespace_map}->{'rss10'}))
Subject: Re: [rt.cpan.org #21740] wrong handling enclosure subelement of item
Date: Wed, 27 Sep 2006 03:03:08 -0700
To: bug-XML-RSS [...] rt.cpan.org
From: Ask Bjørn Hansen <ask [...] perl.org>
On Sep 27, 2006, at 1:41 AM, Alexei Kozlov via RT wrote: Show quoted text
> According to rss 2.0 specification, 'enclosure' - subelement > of 'item' - is an empty xml-element with few attributes.
Hi, Any chance you can make a small test we can include in the test suite? - ask -- http://log.perl.org/
From: nutlet [...] karelia.ru
Show quoted text
> Hi, > > Any chance you can make a small test we can include in the test
suite? Hi! Here is the test for enclosure element. Alexei.
use strict; use Test::More; use constant RSS_VERSION => "2.0"; use constant RSS_ENCLOSURE_URL => qq(http://www.scripting.com/mp3s/weatherReportSuite.mp3); use constant RSS_ENCLOSURE_LENGTH => qq(12216320); use constant RSS_ENCLOSURE_TYPE => qq(audio/mpeg); use constant RSS_DOCUMENT => qq(<?xml version="1.0"?> <rss version="2.0"> <channel> <title>Example 2.0 Channel with Enclosure sub-element of Item</title> <link>http://example.com/</link> <description>To lead by example</description> <language>en-us</language> <copyright>All content Public Domain, except comments which remains copyright the author</copyright> <managingEditor>editor\@example.com</managingEditor> <webMaster>webmaster\@example.com</webMaster> <docs>http://backend.userland.com/rss</docs> <category domain="http://www.dmoz.org">Reference/Libraries/Library_and_Information_Science/Technical_Services/Cataloguing/Metadata/RDF/Applications/RSS/</category> <generator>The Superest Dooperest RSS Generator</generator> <lastBuildDate>Mon, 02 Sep 2002 03:19:17 GMT</lastBuildDate> <ttl>60</ttl> <item> <title>News for September the Second</title> <link>http://example.com/2002/09/02</link> <description>other things happened today</description> <comments>http://example.com/2002/09/02/comments.html</comments> <author>joeuser\@example.com</author> <pubDate>Mon, 02 Sep 2002 03:19:00 GMT</pubDate> <guid isPermaLink="true">http://example.com/2002/09/02</guid> <enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" /> </item> </channel> </rss>); plan tests => 8; use_ok("XML::RSS"); my $xml = XML::RSS->new(); isa_ok($xml,"XML::RSS"); eval { $xml->parse(RSS_DOCUMENT); }; is($@,'',"Parsed RSS feed"); cmp_ok($xml->{'_internal'}->{'version'},"eq",RSS_VERSION,"Is RSS version ".RSS_VERSION); cmp_ok(ref($xml->{items}),"eq","ARRAY","\$xml->{items} is an ARRAY ref"); if($xml->{items} && ref($xml->{items}) eq 'ARRAY'){ my $item = shift @{$xml->{items}}; if($item->{enclosure} && ref($item->{enclosure}) eq 'HASH'){ my $encl = $item->{enclosure}; cmp_ok($encl->{'url'},"eq",RSS_ENCLOSURE_URL, "ENCLOSURE URL is ".RSS_ENCLOSURE_URL); cmp_ok($encl->{'length'},"eq",RSS_ENCLOSURE_LENGTH, "ENCLOSURE URL is ".RSS_ENCLOSURE_LENGTH); cmp_ok($encl->{'type'},"eq",RSS_ENCLOSURE_TYPE, "ENCLOSURE URL is ".RSS_ENCLOSURE_TYPE); }else{ ok(0,"Parsing Enclosure element, sub-element of Item"); } } __END__ =head1 NAME 2.0-parse.t - tests for parsing RSS 2.0 data with XML::RSS.pm =head1 SYNOPSIS use Test::Harness qw (runtests); runtests (./XML-RSS/t/*.t); =head1 DESCRIPTION Tests for parsing RSS 2.0 data with XML::RSS.pm =head1 VERSION $Revision: 1.2 $ =head1 DATE $Date: 2002/11/19 23:56:53 $ =head1 AUTHOR Aaron Straup Cope =head1 SEE ALSO http://backend.userland.com/rss2 =cut
From: nutlet [...] karelia.ru
Just a little note: enclosure element can't contain cdata section. You should skip all cdata for enclosure in 'handle_char' handler. also, all three attributes of enclosure are required - maybe it's good to validate it in 'handle_start' and warn (or even die) with error http://blogs.law.harvard.edu/tech/rss#ltenclosuregtSubelementOfLtitemgt
On Thu Sep 28 04:28:53 2006, nutlet wrote: The enclosure bug has been fixed (RT#7920). Show quoted text
> Just a little note: > enclosure element can't contain cdata section. You should skip all > cdata for enclosure in 'handle_char' handler. also, all three > attributes of enclosure are required - maybe it's good to validate it > in 'handle_start' and warn (or even die) with error > > http://blogs.law.harvard.edu/tech/rss#ltenclosuregtSubelementOfLtitemgt
Can you provide a test for that, then I'll fix it.
We already applied one patch. As for the other suggestions of validating the input - I don't think they fall into the scope of the parser. Also, the original submitter has been unresponsive for over two years. So closing.