Skip Menu |

This queue is for tickets about the AnyEvent-Twitter CPAN distribution.

Report information
The Basics
Id: 53566
Status: resolved
Priority: 0/
Queue: AnyEvent-Twitter

People
Owner: Nobody in particular
Requestors: hideki.yamamura [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: update_status does not support non-ASCII characters
Date: Mon, 11 Jan 2010 03:15:04 +0900
To: bug-anyevent-twitter [...] rt.cpan.org
From: 山村 英貴 <hideki.yamamura [...] gmail.com>
Thanks for this very useful module. I found a bug about handling utf8 strings. The bug is that update_status does not support non-ASCII characters. First, this module use common::sense, so it implies "use utf8". In update_status(), $url->query_form is called with two arguments: (status, $status_e) key:status is utf8-flagged, but val:$status_e is octets. So in URI::_query::query_form, $status_e will be converted to utf8 when those key & val are connected. But $status_e is already converted to utf-8, so posted update turns unreadable strings when $status contained non-ASCII characters (like CJK, etc). --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-11 03:03:21.000000000 +0900 @@ -477,11 +477,9 @@ sub update_status { my ($self, $status, $done_cb) = @_; - my $status_e = _encode_status $status; - my $url = URI::URL->new ($self->{base_url}); $url->path_segments ('statuses', "update.json"); - $url->query_form (status => $status_e); + $url->query_form (status => decode_utf8($status)); my $hdrs = { $self->_get_basic_auth };
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Mon, 11 Jan 2010 16:53:18 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
Hi! On Sun, Jan 10, 2010 at 01:15:39PM -0500, 山村 英貴 via RT wrote: Show quoted text
> Sun Jan 10 13:15:38 2010: Request 53566 was acted upon. > Transaction: Ticket created by hideki.yamamura@gmail.com > Queue: AnyEvent-Twitter > Subject: update_status does not support non-ASCII characters > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: hideki.yamamura@gmail.com > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > > Thanks for this very useful module. > > I found a bug about handling utf8 strings. > The bug is that update_status does not support non-ASCII characters. > > First, this module use common::sense, so it implies "use utf8". > In update_status(), $url->query_form is called with two arguments: > (status, $status_e) > key:status is utf8-flagged, but val:$status_e is octets. > > So in URI::_query::query_form, $status_e will be converted to utf8 > when those key & val are connected. > But $status_e is already converted to utf-8, so posted update turns > unreadable strings > when $status contained non-ASCII characters (like CJK, etc). > > > --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 > 08:09:54.000000000 +0900 > +++ ./lib/AnyEvent/Twitter.pm 2010-01-11 03:03:21.000000000 +0900 > @@ -477,11 +477,9 @@ > sub update_status { > my ($self, $status, $done_cb) = @_; > > - my $status_e = _encode_status $status; > - > my $url = URI::URL->new ($self->{base_url}); > $url->path_segments ('statuses', "update.json"); > - $url->query_form (status => $status_e); > + $url->query_form (status => decode_utf8($status)); > > my $hdrs = { $self->_get_basic_auth }; >
This patch will break the module. The problem is, that C<$status> should be a plain string containing un-encoded unicode characters. calling decode_utf8 on it does not make sense at all, as the input to the update_status function should get un-encoded strings already. As URIs can't represent unicode characters the string must be encoded, which should be utf8 (i think i read that in the twitter API). So the code as it is, is correct if $status is a unicode string. The UTF8 flag is just an internal flag, which' should not be exposed or handled specially on the Perl language level. So, to come back to your Problem: What is the Problem? Can you put together a small test case that exposes the problem? Which Perl version do you use? Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Tue, 12 Jan 2010 01:37:40 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Thanks for replying. I'm using Perl version is 5.8.8 and 5.10.0 (on CentOS 5.3 and Debian lenny). I made a small script for URI::URL's strange behavior. Executing this script, you'll get the four encoded URLs. AnyEvent::Twitter 0.27 uses first result with these two factors: get_url_utf8() (because use common::sense will convert 'status' to utf8-flagged string) $jp_octets (because $status_e was encoded by encoded_utf8($status)) But this encoded url string is completely broken. Apparently URI::URL's query_form can handle utf8-flagged strings when all parameters are utf8-flagged, so I made a patch to make $status utf8-flagged with Encode::decode_utf8. This patch's intention is: When $status was plain-string (utf8 encoded), it would be decoded by decode_utf8. When $status was utf8-flagged strings, it would be done nothing by decode_utf8.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Tue, 12 Jan 2010 11:20:05 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
On Mon, Jan 11, 2010 at 11:38:01AM -0500, 山村 英貴 via RT wrote: Show quoted text
> Queue: AnyEvent-Twitter > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > Thanks for replying. > > I'm using Perl version is 5.8.8 and 5.10.0 (on CentOS 5.3 and Debian lenny). > > I made a small script for URI::URL's strange behavior. > Executing this script, you'll get the four encoded URLs. > > AnyEvent::Twitter 0.27 uses first result with these two factors: > get_url_utf8() (because use common::sense will convert 'status' to > utf8-flagged string) > $jp_octets (because $status_e was encoded by encoded_utf8($status)) > But this encoded url string is completely broken. > > Apparently URI::URL's query_form can handle utf8-flagged strings when > all parameters are utf8-flagged, > so I made a patch to make $status utf8-flagged with Encode::decode_utf8. > > This patch's intention is: > When $status was plain-string (utf8 encoded), it would be decoded by > decode_utf8. > When $status was utf8-flagged strings, it would be done nothing by decode_utf8. >
Best is, that you report this as a Bug against URI::URL. The bug seems to be, that URI::URL double-encodes the string $jp_octets in the first case, as far as I can see it. I don't know what you mess around with the utf8-flag. At Perl language level I should not have to think about that INTERNAL flag. If URI::URL does produce broken URLs if I pass in octets, then the bug is URI::URL. Even if URI::URL doesn't handle strings, which are internally flagged somehow, correctly, it should at least document that. Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Wed, 13 Jan 2010 00:34:10 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
More simply, please refer this script. You are doing same thing in AnyEvent::Twitter line 484. It is very usual problem arond utf8-flag in Japan because we use multibyte languages. I think you should not use common::sense. And I find far more excellent solution. Please use this patch (and drop first patch). --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-13 00:20:12.000000000 +0900 @@ -1,5 +1,6 @@ package AnyEvent::Twitter; -use common::sense; +use strict; +use warnings; use Carp qw/croak/; use AnyEvent; use AnyEvent::HTTP;

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Thu, 14 Jan 2010 21:10:59 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
I found another way to avoid utf8-related problem. This is because status is utf8-on but 'status' is utf8-off. Following your advice, I submit a bug report for URI module. Thanks. --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 08:09:54.000000000 +0900 +++ ./lib/AnyEvent/Twitter.pm 2010-01-14 21:01:23.000000000 +0900 @@ -481,7 +481,7 @@ my $url = URI::URL->new ($self->{base_url}); $url->path_segments ('statuses', "update.json"); - $url->query_form (status => $status_e); + $url->query_form ('status' => $status_e); my $hdrs = { $self->_get_basic_auth };
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Thu, 14 Jan 2010 13:20:06 +0100
To: 山村 英貴 via RT <bug-AnyEvent-Twitter [...] rt.cpan.org>
From: Robin Redeker <elmex [...] ta-sa.org>
On Thu, Jan 14, 2010 at 07:11:29AM -0500, 山村 英貴 via RT wrote: Show quoted text
> Queue: AnyEvent-Twitter > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=53566 > > > I found another way to avoid utf8-related problem. > This is because status is utf8-on but 'status' is utf8-off. > Following your advice, I submit a bug report for URI module. Thanks.
Thats great. Thanks! Btw. the "status => ..." utf-8 flag on bareword bug is known in 5.10 and is fixed in the development version and the next release of Perl5 I think. I stumbled across that myself once. Show quoted text
> > > --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm 2009-11-05 > 08:09:54.000000000 +0900 > +++ ./lib/AnyEvent/Twitter.pm 2010-01-14 21:01:23.000000000 +0900 > @@ -481,7 +481,7 @@ > > my $url = URI::URL->new ($self->{base_url}); > $url->path_segments ('statuses', "update.json"); > - $url->query_form (status => $status_e); > + $url->query_form ('status' => $status_e); > > my $hdrs = { $self->_get_basic_auth }; >
Greetings, Robin -- Robin Redeker | Deliantra, the free code+content MORPG elmex@ta-sa.org / r.redeker@gmail.com | http://www.deliantra.net http://www.ta-sa.org/ |
Subject: Re: [rt.cpan.org #53566] update_status does not support non-ASCII characters
Date: Sat, 30 Jan 2010 15:31:43 +0900
To: bug-AnyEvent-Twitter [...] rt.cpan.org
From: Hideki Yamamura <hideki.yamamura [...] gmail.com>
Hi. I have difficulty that URI.pm's maintainer does not respond my ticket. We have truly no way to post Japanese text to Twitter with AnyEvent::Twitter(0.27) on Perl 5.8 and 5.10. Would you apply this patch? Thanks. Show quoted text
>> --- /usr/lib/perl5/site_perl/5.8.8/AnyEvent/Twitter.pm  2009-11-05 >> 08:09:54.000000000 +0900 >> +++ ./lib/AnyEvent/Twitter.pm      2010-01-14 21:01:23.000000000 +0900 >> @@ -481,7 +481,7 @@ >> >>     my $url = URI::URL->new ($self->{base_url}); >>     $url->path_segments ('statuses', "update.json"); >> -   $url->query_form (status => $status_e); >> +   $url->query_form ('status' => $status_e); >> >>     my $hdrs = { $self->_get_basic_auth };
-- Hideki YAMAMURA <hideki.yamamura@gmail.com>
On Sat Jan 30 01:32:43 2010, hideki.yamamura@gmail.com wrote: Show quoted text
> Hi. > I have difficulty that URI.pm's maintainer does not respond my ticket. > We have truly no way to post Japanese text to Twitter with > AnyEvent::Twitter(0.27) > on Perl 5.8 and 5.10. > Would you apply this patch? Thanks. >
Oh, thats sad @ URI maintainer. I've applied your workaround. You can fetch the updated version from my git repository: http://git.ta-sa.org/AnyEvent-Twitter.git I will release it probably this week. Greetings, Robin