Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 5438
Status: resolved
Priority: 0/
Queue: XML-RSS

People
Owner: Nobody in particular
Requestors: Mathias.Herberts [...] gicm.fr
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.02
Fixed in: (no value)



Subject: XML::RSS outputs invalid UTF-8 code due to entity mapping.
The entity conversion done in encode breaks the resulting XML stream after two passes if its encoding is not ISO-8859-1. The XML::Parser will choke with a 'not well-formed (invalid token)' error. Included code snippet reproduces the problem. I do not see the added value of this entity to ISO-8859-1 character code entity conversion. Mathias.
#!/usr/bin/perl -w use XML::RSS; $RSSFile = "/tmp/xml-rss-bug.rss"; # # Create RSS Object. # my $rss = new XML::RSS (version => '1.0', encoding => 'iso-8859-1', output => '1.0'); # # Add a channel # $rss->channel (title => "Channel Title", link => "http://channel.url/", description => "Channel Description"); # # Add an item with accented characters # $rss->add_item (title => "Item Title", link => "http://item.url/", description => "Item Description (©)"); # # Save RSS content to file. # open (RSS, ">$RSSFile") || die "Unable to open $RSSFile."; print RSS $rss->as_string; close (RSS); # # Now read it back in # $rss = new XML::RSS; $rss->parsefile($RSSFile); print "We read it OK this time...\n"; # # save it again # open (RSS, ">$RSSFile") || die "Unable to open $RSSFile."; print RSS $rss->as_string; close (RSS); # # And read it back in again. # $rss = new XML::RSS; $rss->parsefile($RSSFile); print "But not this time :-(\n";
Agreed the entity encoding has been a failed experiment. Suggestions? [guest - Wed Feb 25 03:37:03 2004]: Show quoted text
> The entity conversion done in encode breaks the resulting XML stream > after two passes if its encoding is not ISO-8859-1. The XML::Parser > will choke with > > a 'not well-formed (invalid token)' error. > > Included code snippet reproduces the problem. > > I do not see the added value of this entity to ISO-8859-1 character > code entity conversion. > > Mathias.
From: javier_pimas [...] yahoo.com
I know this post is old, but it's still unsolved (I think) I have a question: I read the RSS.pm code but couldn't find any place where it actually encodes the string. I mean, for example, if I use encoding ISO-8859-1, the XML will have a header like this <?xml version="1.0" encoding="iso-8859-1"?> but *I think* that the string won't be converted to that encoding. Maybe the module Encode should be used: # RSS.pm use Encode; sub encode { my ($self, $text) = @_; return $text unless $self->{'encode_output'}; my $encoded_text = ''; while ( $text =~ s/(.*?)(\<\!\[CDATA\[.*?\]\]\>)//s ) { $encoded_text .= encode_text($1) . $2; } $encoded_text .= encode_text($text); ############# here would be the change: ########################## # return $encoded_text; return Encode::encode($self->{'encoding'}, $encoded_text); ################################################################## } sub encode_text { my $text = shift; $text =~ s/&(?!(#[0-9]+|#x[0-9a-fA-F]+|\w+);)/&amp;/g; $text =~ s/&($entities);/$entity{$1}/g; $text =~ s/</&lt;/g; return $text; } Bye. El pocho. On Wed Feb 25 03:37:03 2004, guest wrote: Show quoted text
> The entity conversion done in encode breaks the resulting XML stream > after two passes if its encoding is not ISO-8859-1. The XML::Parser > will choke with > > a 'not well-formed (invalid token)' error. > > Included code snippet reproduces the problem. > > I do not see the added value of this entity to ISO-8859-1 character > code entity conversion. > > Mathias.
From: ABH [...] cpan.org
I think this was fixed in 1.12. (Let me know if it wasn't). - ask