Skip Menu |

This queue is for tickets about the MediaWiki CPAN distribution.

Report information
The Basics
Id: 21288
Status: open
Priority: 0/
Queue: MediaWiki

People
Owner: edwardspec [...] gmail.com
Requestors: stuart.caie [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 1.03
  • 1.04
  • 1.05
  • 1.06
  • 1.07
Fixed in: (no value)



Subject: Unicode/UTF8 not fully handled
There are a lot of places where strings with UTF8 characters are included are not passed on correctly to the wiki. libwww-perl does not promise to do any conversion of data, it just takes the raw byte values of data sent. So in order to pass UTF8-encoded data, it has to be done by the caller. I would appreciate if I did not explicitly have to encode utf8 data before passing it to MediaWiki, I think MediaWiki should do this itself. For example, a call to upload currently looks like this: $BOT->upload(encode_utf8($image_name), $image_data, encode_utf8($description)); It would be good if it was simply $BOT->upload($image_name, $image_data, $description);
On Суб. Сен. 02 16:22:46 2006, stuart.caie@gmail.com wrote: Show quoted text
> There are a lot of places where strings with UTF8 characters are > included are not passed on correctly to the wiki. > > libwww-perl does not promise to do any conversion of data, it just takes > the raw byte values of data sent. So in order to pass UTF8-encoded data, > it has to be done by the caller. I would appreciate if I did not > explicitly have to encode utf8 data before passing it to MediaWiki, I > think MediaWiki should do this itself. > > For example, a call to upload currently looks like this: > > $BOT->upload(encode_utf8($image_name), $image_data, > encode_utf8($description)); > > It would be good if it was simply $BOT->upload($image_name, $image_data, > $description);
The only place to fix was in MediaWiki::page::_wiki_url(). There is no need in escaping description - it's not done by LWP but this is handled in HTTP::Request::Common (in this syntax: my $res = $obj->{ua}->request( POST $url, Content => [( $key => $val, # both $key and $val are escaped $key2 => $val2 # both $key2 and $val2 are escaped )] ); Fixed version will be included into 1.09 version.
From: stuart.caie [...] gmail.com
On Thu Sep 07 12:54:13 2006, SPECTRUM wrote: Show quoted text
> The only place to fix was in MediaWiki::page::_wiki_url(). There is no > need in escaping description - it's not done by LWP but this is handled > in HTTP::Request::Common (in this syntax: > my $res = $obj->{ua}->request( > POST $url, > Content => [( > $key => $val, # both $key and $val are escaped > $key2 => $val2 # both $key2 and $val2 are escaped > )] > );
I checked further into this - it is not handled by HTTP::Request::Common. It is bug #15294 in the URI module (http://rt.cpan.org/Public/Bug/Display.html?id=15294) Firstly, you ask that HTTP::Request::Common encode using urlencoding (application/x-www-form-urlencoded). To get the urlencoded form, HTTP::Request::Common will create a URI object and use query_form() on it. URI::query_form() passes elements to URI::Escape::uri_escape() As noted in the URI::Escape documentation, this is the absolutely wrong thing to do for unicode. Implementing the patch given in bug #15294 and also adding "use encoding 'utf8';" to the head of the script fixes the problem. I would really appreciate if you could let people know in your documentation that they need to do these two things to get UTF-8 encoding working with this module.