Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Net-Twitter CPAN distribution.

Report information
The Basics
Id: 54710
Status: rejected
Priority: 0/
Queue: Net-Twitter

People
Owner: MMIMS [...] cpan.org
Requestors: numberxiii [...] free.fr
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Double encoding status (utf-8 charset)
Date: Wed, 17 Feb 2010 16:38:49 +0100
To: bug-Net-Twitter [...] rt.cpan.org
From: numberxiii [...] free.fr
Hi I've encountered a charset problem with Net::Twitter::Lite. On some systems, when I use oAuth authentication, the status was 'UTF-8 encoded' twice. On other system, there is no double encoding. The text of the status was already utf-8 encoded. I resolved my issue by the following way : $decodedStatus = Encode::decode(“UTF-8″,$status) (cf. http://www.social.com/main/twitter-oauth-using-perl/) I don't understand why there is no problem on an Ubuntu distro and problem on a Gentoo distro. I found in the code the following line : local $Net::OAuth::SKIP_UTF8_DOUBLE_ENCODE_CHECK = 1; (http://github.com/semifor/net-twitter-lite/blob/master/src/net-twitter-lite.tt2, line 281) My configuration on both system : Net::Twitter::Lite version is 0.08006 URI version is 1.52 On the gentoo system : LWP::UserAgent version is 2.033 On the Ubuntu system: LWP::UserAgent version is 5.829 Regards numberxiii
On Wed Feb 17 10:41:22 2010, numberxiii@free.fr wrote: Show quoted text
> I've encountered a charset problem with Net::Twitter::Lite. > > On some systems, when I use oAuth authentication, the status was 'UTF- > 8 encoded' > twice. On other system, there is no double encoding. > The text of the status was already utf-8 encoded. > > I resolved my issue by the following way : > $decodedStatus = Encode::decode(“UTF-8″,$status) > (cf. http://www.social.com/main/twitter-oauth-using-perl/) > > I don't understand why there is no problem on an Ubuntu distro and > problem on a > Gentoo distro. > > I found in the code the following line : > local $Net::OAuth::SKIP_UTF8_DOUBLE_ENCODE_CHECK = 1;
That's intentional. It's there to allow passing Latin-1 to Net::Twitter without first decoding it. It's basically for backwards compatibility. Show quoted text
> > (http://github.com/semifor/net-twitter-lite/blob/master/src/net- > twitter-lite.tt2, > line 281) > > My configuration on both system : > Net::Twitter::Lite version is 0.08006 > URI version is 1.52 > > On the gentoo system : > LWP::UserAgent version is 2.033 > > On the Ubuntu system: > LWP::UserAgent version is 5.829
From your description, it sounds like you are passing a utf8 encoded byte string to Net::Twitter rather than a decoded character string. Net::Twitter expects either decoded characters or Latin-1. Unless I'm mistaken, decoding an already decoded utf8 character string is a no-op. So, if decoding solved your problem, it must not have been decoded to start with. I've added a unicode test to Net::Twitter and ported it to Net::Twitter::Lite. You can find them here: http://github.com/semifor/Net-Twitter/blob/master/t/unicode.t http://github.com/semifor/net-twitter-lite/blob/master/t/unicode.t If you can provide a failing test, that would be very helpful. -Marc
Subject: Re: [rt.cpan.org #54710] AutoReply: Double encoding status (utf-8 charset)
Date: Thu, 18 Feb 2010 14:52:14 +0100
To: bug-Net-Twitter [...] rt.cpan.org
From: numberxiii [...] free.fr
I'm sorry, I didn't know that Twitter::Lite expects Latin-1 ! The status I posted to Twitter comes from the content of a web page in UTF-8 charset (retrieved with WWW::Mechanize). The 'bug' of mal-formed strings appears only when I use the oauth authentication, and that sounds strange for me. And, cherry on the cake, only on the production server, not on my workstation. I will try your test asap. numberxiii
On Thu Feb 18 08:56:11 2010, numberxiii@free.fr wrote: Show quoted text
> I'm sorry, I didn't know that Twitter::Lite expects Latin-1 !
It would be more correct to say Net::Twitter expects decoded character strings, but accepts latin1, just like perl itself. I did, however, find and fix an encoding problem when un-decoded latin1 with characters in the \x80-\xff are passed to Net::Twitter and Basic Auth is used: http://github.com/semifor/Net-Twitter/commit/31ca4d62c7e610c618b7dce7e818991aff94015e Show quoted text
> The status I posted to Twitter comes from the content of a web page in > UTF-8 > charset (retrieved with WWW::Mechanize). > The 'bug' of mal-formed strings appears only when I use the oauth > authentication, and that sounds strange for me. > And, cherry on the cake, only on the production server, not on my > workstation.
If you test the string you get from WWW::Mechanize is the utf8 bit set? utf8::is_utf8($string) Can you point me to the resulting status on twitter? -Marc
No followup and no similar bug reports. So, I'm assuming this bug report is not valid. Please reopen it if the problem still exists and provide some details. Thank you. -Marc