Skip Menu |

This queue is for tickets about the Geo-Coder-Bing CPAN distribution.

Report information
The Basics
Id: 47688
Status: resolved
Priority: 0/
Queue: Geo-Coder-Bing

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.02
Fixed in: (no value)



Subject: Problem using non utf8-flagged strings
It seems that Geo::Coder::Bing requires the address argument to be in utf8 *octets*. This is not how perl modules should work, but they should always operate on *characters*. The attached test script shows the problem. The script does two geolocation attempts with the same string. In the first case, the string does not have the utf8 flag, in the second case, it has. Only the second case delivers a result. Note that I added a test in the middle to show that the two versions of the string are equal (is $utf8_address, $address). I guess that the geolocation API expects the arguments to be in utf8. So somewhere an encode "utf-8", $string call is probably missing in the Geo::Coder::Bing module. Regards, Slaven
Subject: utf8.t
#!/usr/bin/perl -w # -*- perl -*- use strict; use warnings; use Data::Dumper; use Devel::Peek; use Encode; use Geo::Coder::Bing; use Test::More tests => 3; my $VERBOSE = 1; my $geocoder = Geo::Coder::Bing->new; my $address = "Rübländerstraße, Berlin, Germany"; Dump $address if $VERBOSE; my $location = $geocoder->geocode($address); ok $location, "Supplied string without utf8 flag"; warn Dumper $location if $VERBOSE; my $utf8_address = decode("iso-8859-1", $address); # force utf8 flag is $utf8_address, $address; Dump $utf8_address if $VERBOSE; $location = $geocoder->geocode($utf8_address); ok $location, "Supplied string with utf8 flag"; warn Dumper $location if $VERBOSE; __END__
On Wed Jul 08 04:23:01 2009, SREZIC wrote: Show quoted text
> It seems that Geo::Coder::Bing requires the address argument to be in > utf8 *octets*. This is not how perl modules should work, but they should > always operate on *characters*. The attached test script shows the > problem. The script does two geolocation attempts with the same string. > In the first case, the string does not have the utf8 flag, in the second > case, it has. Only the second case delivers a result. Note that I added > a test in the middle to show that the two versions of the string are > equal (is $utf8_address, $address). > > I guess that the geolocation API expects the arguments to be in utf8. So > somewhere an > > encode "utf-8", $string > > call is probably missing in the Geo::Coder::Bing module. > > Regards, > Slaven
Both cases work for me. I also have a utf-8 test case in the distribution under xt/author/live.t. Either LWP::UserAgent or URI are handling this automatically. I just looked at both Geo::Coder::{Google,Yahoo} and Google has a minimum required version specified for URI. So I would first try to upgrade URI. If that doesn't work, try upgrading libwww-perl. I'm assuming it's URI and will bump the min required version of URI tomorrow. Let me know if it doesn't work for you.
On Wed Jul 08 05:00:46 2009, GRAY wrote: Show quoted text
> On Wed Jul 08 04:23:01 2009, SREZIC wrote:
> > It seems that Geo::Coder::Bing requires the address argument to be in > > utf8 *octets*. This is not how perl modules should work, but they should > > always operate on *characters*. The attached test script shows the > > problem. The script does two geolocation attempts with the same string. > > In the first case, the string does not have the utf8 flag, in the second > > case, it has. Only the second case delivers a result. Note that I added > > a test in the middle to show that the two versions of the string are > > equal (is $utf8_address, $address). > > > > I guess that the geolocation API expects the arguments to be in utf8. So > > somewhere an > > > > encode "utf-8", $string > > > > call is probably missing in the Geo::Coder::Bing module. > > > > Regards, > > Slaven
> > Both cases work for me. I also have a utf-8 test case in the > distribution under xt/author/live.t.
I see the test, but it does not look right to me. The French address uses utf8 octets (the first one), but in Perl script one is supposed to deal just with characters. There's no test for using characters without utf8 flag (which would be iso-8859-1 octets). Show quoted text
> Either LWP::UserAgent or URI are > handling this automatically. I just looked at both > Geo::Coder::{Google,Yahoo} and Google has a minimum required version > specified for URI. So I would first try to upgrade URI. If that doesn't > work, try upgrading libwww-perl. I'm assuming it's URI and will bump the > min required version of URI tomorrow. Let me know if it doesn't work for > you.
I upgraded URI and LWP, but still the first test in my test script fails. $ /usr/perl5.8.9@RC2/bin/perl -MLWP\ 9999 LWP version 9999 required--this is only version 5.828. $ /usr/perl5.8.9@RC2/bin/perl -MURI\ 9999 URI version 9999 required--this is only version 1.38. Regards, Slaven
On Wed Jul 08 08:03:01 2009, SREZIC wrote: Show quoted text
> On Wed Jul 08 05:00:46 2009, GRAY wrote:
> > On Wed Jul 08 04:23:01 2009, SREZIC wrote:
> > > It seems that Geo::Coder::Bing requires the address argument to be in > > > utf8 *octets*. This is not how perl modules should work, but they
should Show quoted text
> > > always operate on *characters*. The attached test script shows the > > > problem. The script does two geolocation attempts with the same
string. Show quoted text
> > > In the first case, the string does not have the utf8 flag, in the
second Show quoted text
> > > case, it has. Only the second case delivers a result. Note that I
added Show quoted text
> > > a test in the middle to show that the two versions of the string are > > > equal (is $utf8_address, $address). > > > > > > I guess that the geolocation API expects the arguments to be in
utf8. So Show quoted text
> > > somewhere an > > > > > > encode "utf-8", $string > > > > > > call is probably missing in the Geo::Coder::Bing module. > > > > > > Regards, > > > Slaven
> > > > Both cases work for me. I also have a utf-8 test case in the > > distribution under xt/author/live.t.
> > I see the test, but it does not look right to me. The French address > uses utf8 octets (the first one), but in Perl script one is supposed to > deal just with characters. There's no test for using characters without > utf8 flag (which would be iso-8859-1 octets). >
> > Either LWP::UserAgent or URI are > > handling this automatically. I just looked at both > > Geo::Coder::{Google,Yahoo} and Google has a minimum required version > > specified for URI. So I would first try to upgrade URI. If that doesn't > > work, try upgrading libwww-perl. I'm assuming it's URI and will bump the > > min required version of URI tomorrow. Let me know if it doesn't work for > > you.
> > I upgraded URI and LWP, but still the first test in my test script fails. > > $ /usr/perl5.8.9@RC2/bin/perl -MLWP\ 9999 > LWP version 9999 required--this is only version 5.828. > $ /usr/perl5.8.9@RC2/bin/perl -MURI\ 9999 > URI version 9999 required--this is only version 1.38. > > Regards, > Slaven
I cut and paste instead of saving your test cases. That explains why the first test case worked for me. As latin1 is frequently really winlatin1, I think it's best if the individual decode their own data first. It's consistent with what Miyagawa has done in Geo::Coder::Google: When you'd like to pass non-ascii string as a location, you should pass it as either UTF-8 bytes or Unicode flagged string.
On Wed Jul 08 10:27:35 2009, GRAY wrote: Show quoted text
> On Wed Jul 08 08:03:01 2009, SREZIC wrote:
> > On Wed Jul 08 05:00:46 2009, GRAY wrote:
> > > On Wed Jul 08 04:23:01 2009, SREZIC wrote:
> > > > It seems that Geo::Coder::Bing requires the address argument to
be in Show quoted text
> > > > utf8 *octets*. This is not how perl modules should work, but they
> should
> > > > always operate on *characters*. The attached test script shows the > > > > problem. The script does two geolocation attempts with the same
> string.
> > > > In the first case, the string does not have the utf8 flag, in the
> second
> > > > case, it has. Only the second case delivers a result. Note that I
> added
> > > > a test in the middle to show that the two versions of the string are > > > > equal (is $utf8_address, $address). > > > > > > > > I guess that the geolocation API expects the arguments to be in
> utf8. So
> > > > somewhere an > > > > > > > > encode "utf-8", $string > > > > > > > > call is probably missing in the Geo::Coder::Bing module. > > > > > > > > Regards, > > > > Slaven
> > > > > > Both cases work for me. I also have a utf-8 test case in the > > > distribution under xt/author/live.t.
> > > > I see the test, but it does not look right to me. The French address > > uses utf8 octets (the first one), but in Perl script one is supposed to > > deal just with characters. There's no test for using characters without > > utf8 flag (which would be iso-8859-1 octets). > >
> > > Either LWP::UserAgent or URI are > > > handling this automatically. I just looked at both > > > Geo::Coder::{Google,Yahoo} and Google has a minimum required version > > > specified for URI. So I would first try to upgrade URI. If that
doesn't Show quoted text
> > > work, try upgrading libwww-perl. I'm assuming it's URI and will
bump the Show quoted text
> > > min required version of URI tomorrow. Let me know if it doesn't
work for Show quoted text
> > > you.
> > > > I upgraded URI and LWP, but still the first test in my test script
fails. Show quoted text
> > > > $ /usr/perl5.8.9@RC2/bin/perl -MLWP\ 9999 > > LWP version 9999 required--this is only version 5.828. > > $ /usr/perl5.8.9@RC2/bin/perl -MURI\ 9999 > > URI version 9999 required--this is only version 1.38. > > > > Regards, > > Slaven
> > I cut and paste instead of saving your test cases. That explains why the > first test case worked for me. > > As latin1 is frequently really winlatin1, I think it's best if the > individual decode their own data first. It's consistent with what > Miyagawa has done in Geo::Coder::Google: > > When you'd like to pass non-ascii string as a location, you should > pass it as either UTF-8 bytes or Unicode flagged string.
Sorry, I have to disagree. That's not how utf8 and non-utf8 is supposed to work in Perl. If two strings report true on an "eq" operation, but behave differently (because one has the utf8 flag internally and the other not), then this is an error. If Geo::Coder::Google is also doing it like this, then it also has to be fixed. Regards, Slaven