Skip Menu |

This queue is for tickets about the XML-RSS CPAN distribution.

Report information
The Basics
Id: 24010
Status: resolved
Priority: 0/
Queue: XML-RSS

People
Owner: Nobody in particular
Requestors: franck.perrot [...] epfl.ch
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: XML::RSS Wrong encoding for utf-8 text with accents
Date: Wed, 20 Dec 2006 11:39:25 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hello, I use: - Linux Red hat AS 4 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:28:02 EDT 2006 i686 i686 i386 GNU/Linux - XML::RSS 1.22 - Perl v5.8.5 built for i386-linux-thread-multi Problem: if I use XML::RSS with utf-8 text (with accents), then the resulting RSS file is not encoded correctly. (The same program work perfectly with XML::RSS 1.10.) For example: ... $rss = new XML::RSS::Podcast ( version => '2.0', encoding => 'UTF-8', ); $rss->channel( description => "dernières entrées de la videothéque", <-- UTF-8 ... ); give a RSS file with: "derni&#xC3;&#xA8;res entr&#xC3;&#xA9;es de la videoth&#xC3;&#xA9;que" Displayed by a web browser and by itunes: dernières entrées de la videothéque I tried the "workaround" (??): http://rt.cpan.org/Public/Bug/Display.html?id=12376 $description = "dernières entrées de la videothéque"; <-- UTF-8 $description =~ s/\&/\&\#038\;/g; $rss->channel( description => $description, ... ); but this did'nt work at all :( So, I think that since you use HTML::Entities the result for utf-8 with accent is wrong. Thanks for any help about this problem. I use XML:RSS since a while but when I did an upgrade I got this wrong behavior. Kind regards, franck Perrot
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Wed, 20 Dec 2006 13:26:36 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: Ask Bjørn Hansen <ask [...] perl.org>
Hi, Thanks for the bug report. Can you make a .t file that demonstrates the problem? That'll make it easier to track down. - ask
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Wed, 20 Dec 2006 14:31:49 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
ask@perl.org via RT wrote: Show quoted text
Show quoted text
> Can you make a .t file that demonstrates the problem? That'll make > it easier to track down.
Hi, ok, see joined files. - Run test_XML_RSS.pl - Try with XML::RSS version 1.22 then with version 1.10 - Look at a file called /tmp/rss.xml I join the 2 resulting files for both versions. Thx, Kind regards, franck
#!/usr/bin/perl # Author: franck Perrot - franck.perrot@epfl.ch - EPFL # Date: 20/04/2005 - 23/06/2006 # EPFLTV version 2 $|++; # disables buffering on STDOUT which is a good thing for CGI scripts use strict; # restrict unsafe constructs use warnings; # must be commented for production! #use diagnostics; # Rend les messages de -w plus explicites push (@INC, 'pwd'); use XML::RSS; my ($rss); $rss = new XML::RSS ( version => '2.0', encoding => 'UTF-8', ); $rss->channel( title => "éàè" ); $rss->save("/tmp/rss.xml");
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule"> <channel> <title>éàè</title> <link></link> <description></description> </channel> </rss>
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule" > <channel> <title>&#xC3;&#xA9;&#xC3;&#xA0;&#xC3;&#xA8;</title> <link></link> <description></description> </channel> </rss>
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Thu, 04 Jan 2007 09:32:50 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hi and happy new year! any news about this bug? Thx, Kind regards, franck P. Show quoted text
> ask@perl.org via RT wrote: >
>> Can you make a .t file that demonstrates the problem? That'll make >> it easier to track down.
> > Hi, > > ok, see joined files. > - Run test_XML_RSS.pl > - Try with XML::RSS version 1.22 then with version 1.10 > - Look at a file called /tmp/rss.xml > > I join the 2 resulting files for both versions. > > Thx, > > Kind regards, > franck > > > ------------------------------------------------------------------------ > > <?xml version="1.0" encoding="UTF-8"?> > > <rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule"> > > <channel> > <title>éàè</title> > <link></link> > <description></description> > > </channel> > </rss> > > > ------------------------------------------------------------------------ > > <?xml version="1.0" encoding="UTF-8"?> > > <rss version="2.0" > xmlns:blogChannel="http://backend.userland.com/blogChannelModule" > > <channel> > <title>&#xC3;&#xA9;&#xC3;&#xA0;&#xC3;&#xA8;</title> > <link></link> > <description></description> > > </channel> > </rss>
On Thu Jan 04 03:33:17 2007, franck.perrot@epfl.ch wrote: Show quoted text
> Hi and happy new year! >
Hi! Show quoted text
> any news about this bug? >
Well, I looked into it. After adding "use utf8;" to your code to make the string be in UTF-8 encoding, I saved the RSS and viewed it in Firefox 2.0.0.1. Despite the fact the accented characters were encoded the RSS was displayed fine with accents and everything. So they are implicitly decoded. Isn't that good enough for you? Regards, Shlomi Fish Show quoted text
> Thx, > Kind regards, > franck P. >
> > ask@perl.org via RT wrote: > >
> >> Can you make a .t file that demonstrates the problem? That'll
make Show quoted text
> >> it easier to track down.
> > > > Hi, > > > > ok, see joined files. > > - Run test_XML_RSS.pl > > - Try with XML::RSS version 1.22 then with version 1.10 > > - Look at a file called /tmp/rss.xml > > > > I join the 2 resulting files for both versions. > > > > Thx, > > > > Kind regards, > > franck > > > > > >
> ------------------------------------------------------------------------
> > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <rss version="2.0"
> xmlns:blogChannel="http://backend.userland.com/blogChannelModule">
> > > > <channel> > > <title>éàè</title> > > <link></link> > > <description></description> > > > > </channel> > > </rss> > > > > > >
> ------------------------------------------------------------------------
> > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <rss version="2.0" > > xmlns:blogChannel="http://backend.userland.com/blogChannelModule" > > > > <channel> > > <title>&#xC3;&#xA9;&#xC3;&#xA0;&#xC3;&#xA8;</title> > > <link></link> > > <description></description> > > > > </channel> > > </rss>
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Fri, 23 Feb 2007 10:19:56 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Show quoted text
> Well, I looked into it. After adding "use utf8;" to your code to make > the string be in UTF-8 encoding, I saved the RSS and viewed it in > Firefox 2.0.0.1. Despite the fact the accented characters were encoded > the RSS was displayed fine with accents and everything. So they are > implicitly decoded. > > Isn't that good enough for you?
Hi, you where right by adding "use utf8" but the result is still NOT correct if you can NOT use it. Look at this example: ----------------------------------------------------------------- #!/usr/bin/perl use strict; # restrict unsafe constructs use Encode; # to UTF-8 encode an ISO 8859-1 string use XML::RSS; my ($rss, $description, $str); $str = "éàè"; # <- ISO-8859-1 string (in fact from my MYSQL DB) $description = encode_utf8($str); # real UTF-8 string, I have checked $rss = new XML::RSS ( version => '2.0', ); $rss->channel( title => $description ); $rss->save("/tmp/rss.xml"); ------------------------------------------------------------------ With this example, you dont use an UTF-8 string inside the source code, so I think "use utf8" is not helpfull. Then the resulting rss.xml file is not displayed with Firefox correctly. Best regards and thanks for your help, franck Perrot
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Fri, 23 Feb 2007 11:21:39 +0100
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
I found the problem! In fact, it seem that your module ALREADY encode to utf-8. So, my program encoded 2 times in utf-8... I think, you shoul add a note to the module telling that we dont need to encode string in utf-8 because the module already do it. Sorry for this confusion. Thx!, franck Perrot
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Wed, 13 Apr 2011 18:01:07 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hello, Do you remember me ? Sorry to come again to you but I still have the same problem with UTF8 encoding. I thing XML::RSS encode twice my UTF8 string and so the RSS file is wrong. Then ç is displayed ç Maybe because (your doc) : XML::RSS will make sure to encode any entities in generated RSS. This is now on by default. And before, old version, it was not. So, could it be possible to NOT encode by default ? Or to avoid that ? You can test with the code below. Thanks for any help, Kind regards, Franck Show quoted text
>
>> Well, I looked into it. After adding "use utf8;" to your code to make >> the string be in UTF-8 encoding, I saved the RSS and viewed it in >> Firefox 2.0.0.1. Despite the fact the accented characters were encoded >> the RSS was displayed fine with accents and everything. So they are >> implicitly decoded. >> >> Isn't that good enough for you?
> > Hi, > you where right by adding "use utf8" but the result is still NOT correct > if you can NOT use it. Look at this example: > > ----------------------------------------------------------------- > #!/usr/bin/perl > use strict; # restrict unsafe constructs > use Encode; # to UTF-8 encode an ISO 8859-1 string > use XML::RSS; > > my ($rss, $description, $str); > > $str = "éàè"; # <- ISO-8859-1 string (in fact from my MYSQL DB) > $description = encode_utf8($str); # real UTF-8 string, I have checked > > $rss = new XML::RSS ( > version => '2.0', > ); > > $rss->channel( > title => $description > ); > $rss->save("/tmp/rss.xml"); > ------------------------------------------------------------------ > > With this example, you dont use an UTF-8 string inside the source code, > so I think "use utf8" is not helpfull. Then the resulting rss.xml file > is not displayed with Firefox correctly. > > Best regards and thanks for your help, > > franck Perrot > >
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Thu, 14 Apr 2011 11:22:24 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Below is a simple test program. If you look at the result file rss.xml in a Browser: - All is fine using XML-RSS-1.10 (uncomment the "use lib" line - Wrong encoding result with the lastest XML-RSS module (ç = ç ...) Thanks for any help, Franck #! /usr/local/bin/perl use strict; # restrict unsafe constructs use warnings; # must be commented for production! use Encode; #use lib "/Test/XML-RSS-1.10"; use XML::RSS; my ($rss, $description, $str); $str = "Mon titre çéàè"; $description = encode_utf8($str); $rss = new XML::RSS ( version => '1.0', encoding => 'UTF-8', ); $rss->channel( title => $description ); $rss->save("/var/www/html/EPFLTV/Test/rss.xml");
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Thu, 14 Apr 2011 15:01:55 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
So, if I change the save subroutine like following, AND add "encode_output => 0," all is perfect: sub save { my ($self, $file) = @_; local (*OUT); #open(OUT, $self->_get_save_output_mode(), "$file") open(OUT, ">$file") or croak "Cannot open file $file for write: $!"; print OUT $self->as_string; close OUT; } $rss = new XML::RSS ( version => '2.0', encode_output => 0, encoding => 'UTF-8', ); Maybe this is because my string are already in UTF-8 and the save subroutine convert them again ? So, as a simple workaround, maybe you could you add an option like no_encode telling the save subroutine to not try to convert. This is my test program I used: #! /usr/local/bin/perl use strict; use warnings; use Encode; use XML::RSS; my ($rss, $description, $str); $str = "Mon titre çéàè"; $description = encode_utf8($str); $rss = new XML::RSS ( version => '2.0', encode_output => 0, encoding => 'UTF-8', ); $rss->channel( title => $description ); $rss->save("/var/www/html/EPFLTV/Test/rss.xml");
Hi Franck, a few comments on your code. On Thu Apr 14 09:02:08 2011, franck.perrot@epfl.ch wrote: Show quoted text
> > So, if I change the save subroutine like following, AND add > "encode_output => 0," all is perfect: > > sub save { > my ($self, $file) = @_; > > local (*OUT); > > #open(OUT, $self->_get_save_output_mode(), "$file") > open(OUT, ">$file") > or croak "Cannot open file $file for write: $!";
This is a two-argument open and is bad karma: https://www.socialtext.net/perl5/two_argument_open Show quoted text
> print OUT $self->as_string; > close OUT; > } >
Furthermore, you can over-ride this method in an inherited class, or simply do something with ->as_string(). Show quoted text
> > > $rss = new XML::RSS ( > version => '2.0', > encode_output => 0, > encoding => 'UTF-8', > ); >
That should be: [code] my $rss = XML::RSS->new( version => '2.0', encode_output => 0, encoding => 'UTF-8', ); [/code] Show quoted text
> > Maybe this is because my string are already in UTF-8 and the save > subroutine convert them again ? > > So, as a simple workaround, maybe you could you add an option like > no_encode telling the save subroutine to not try to convert. > > > > This is my test program I used: > > #! /usr/local/bin/perl > > use strict; > use warnings; > use Encode; > > use XML::RSS; > > my ($rss, $description, $str); > > $str = "Mon titre çéàè"; > $description = encode_utf8($str); > > $rss = new XML::RSS ( > version => '2.0', > encode_output => 0, > encoding => 'UTF-8', > ); > > $rss->channel( > title => $description > ); > $rss->save("/var/www/html/EPFLTV/Test/rss.xml");
Please create a .t file. And also don't pre-declare all variables at the start, but instead do a "my $rss = XML::RSS->new(...)". http://perl-begin.org/tutorials/bad-elements/ Regards, -- Shlomi Fish
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Tue, 19 Apr 2011 11:12:12 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hello Shlomi, thanks a lot for your advice. I can do the job myself, dont use the RSS save subroutine, etc, but the UTF8 problem, I think, is still there. If you look at the xml file produced by XML::RSS (through Firefox for example, or a validator), then the result is wrong if the text is in UTF8 with chars like accents or even &,>,>,", ... Anyway, Thanks again and have a nice day, Franck Show quoted text
>> This is my test program I used: >> >> #! /usr/local/bin/perl >> >> use strict; >> use warnings; >> use Encode; >> >> use XML::RSS; >> >> my ($rss, $description, $str); >> >> $str = "Mon titre çéàè"; >> $description = encode_utf8($str); >> >> $rss = new XML::RSS ( >> version => '2.0', >> encode_output => 0, >> encoding => 'UTF-8', >> ); >> >> $rss->channel( >> title => $description >> ); >> $rss->save("/var/www/html/EPFLTV/Test/rss.xml");
Hi Franck, On Tue Apr 19 05:12:23 2011, franck.perrot@epfl.ch wrote: Show quoted text
> Hello Shlomi, > > thanks a lot for your advice. > > I can do the job myself, dont use the RSS save subroutine, etc, but the > UTF8 problem, I think, is still there. > > If you look at the xml file produced by XML::RSS (through Firefox for > example, or a validator), then the result is wrong if the text is in > UTF8 with chars like accents or even &,>,>,", ... >
For the third time or so, please write a self-contained .t test file that demonstrates the problem, because your instructions are hard to follow. You're being very uncooperative, and if you want to get this bug fix, you'll need to learn to cooperate. Please acknowledge that you are going to work on a .t file, so we can make some progress from here, or else I'll have to close this bug. Regards, -- Shlomi Fish
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Tue, 19 Apr 2011 16:42:46 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hello, Show quoted text
> For the third time or so, please write a self-contained .t test file > that demonstrates the problem, because your instructions are hard to > follow.
sorry, but I realy dont know how to write such file. Regards, Franck
Hi Franck, On Tue Apr 19 10:42:56 2011, franck.perrot@epfl.ch wrote: Show quoted text
> Hello, >
> > For the third time or so, please write a self-contained .t test file > > that demonstrates the problem, because your instructions are hard to > > follow.
> > sorry, but I realy dont know how to write such file. >
Thanks for being frank. In any case, please learn how to do so - it's not hard, and won't take a lot of time. See: * http://search.cpan.org/perldoc?Test::Tutorial * http://www.shlomifish.org/lecture/Perl/Newbies/lecture5/testing/ Regards, -- Shlomi Fish
Subject: Re: [rt.cpan.org #24010] XML::RSS Wrong encoding for utf-8 text with accents
Date: Mon, 06 Jun 2011 11:26:37 +0200
To: bug-XML-RSS [...] rt.cpan.org
From: franck perrot <franck.perrot [...] epfl.ch>
Hello, so, after more and more deep investigation, I found the problem. It was a Perl problem. Some string, UTF8 encoded, did'nt know there were UTF8 encoded... Look's stupid but it's a fact. Then RSS.pm encoded the string again and the display was wrong. Am I clear ? You can close the bug report, it was not a bug. Thanks, Best regards, franck Perrot
On Mon Jun 06 05:26:47 2011, franck.perrot@epfl.ch wrote: Show quoted text
> Hello, > > so, after more and more deep investigation, I found the problem. It was > a Perl problem. Some string, UTF8 encoded, did'nt know there were UTF8 > encoded... Look's stupid but it's a fact. Then RSS.pm encoded the string > again and the display was wrong. > > Am I clear ? > > You can close the bug report, it was not a bug.
Closing, thanks! Regards, -- Shlomi Fish Show quoted text
> Thanks, > > Best regards, > franck Perrot >