Subject: | 400 Bad Request when using utf8 in location. |
I am using Geo::Coder::Google 0.18, that's latest as of now on Perl 5.20.
The following program results in `Google Maps API returned error: 400 Bad Request at geo_bug.pl`
use v5.20;
use strict;
use utf8;
use Geo::Coder::Google;
my $geocoder = Geo::Coder::Google->new( apiver => 3, key => 'AIzaSyDhg_MRCJvwFBYP56k65uf_HVC2iFjjWmU' );
$geocoder->geocode( location => 'Kielstraße 23, 70123, Germany' );
When using a `key` parameter string with the utf8 marker set the library creates requests with invalid characters in it as soon as
the location that is to be geocoded contains unicode chars.
Removing lines 77-79 in Google/V3.pm fixes the issue.
77 if (Encode::is_utf8($location)) {
78 $location = Encode::encode_utf8($location);
79 }
I am not sure though which other use-cases these lines were written for.
What follows is an explanation of what I think goes wrong when the above lines are *not* removed.
Geo/Coder/Google/V3.pm lines 77-79 UTF8 encode the location parameter if it has the utf8 marker.
This causes that the query parameters passed to the URI object later on can be of mixed encoding status (some encoded already, some not yet).
That in turn makes URI double encode the already encoded parameters.
If I interpret https://metacpan.org/pod/URI#BUGS correctly, URI depends on the UTF8 marker on strings to determine whether to encode the string as UTF8
prior to percent encoding or not.
The behavior I observe, is that when calling $uri->query_form(%hash) and *any* of the values in hash have the utf8 marker set, then *all* hash values are taken to
be utf8 encoded already. If *none* of them have the utf8 marker set, then *all* hash values are utf8 encoded prior to percent encoding.
The following program demonstrates that behavior. The erroneous param should be encoded as "x%C3%9Fz", but is encoded as "x%C3%83%C2%9Fz".
use v5.20;
use strict;
use Encode;
use URI;
use Devel::Peek;
my $uri = URI->new("https://maps.googleapis.com/maps/api/geocode/json");
# URI gives the correct result when using the top pair or the bottom pair.
# It breaks when mixing decoded with non decoded params as below.
my %params = (
#erroneous => decode('utf8', 'xßz'),
dummy => decode('utf8', 'abc'), # sets the utf8 marker
erroneous => 'xßz',
#dummy => 'abc',
);
$uri->query_form(%params);
Dump( %params );
say 'url: ' . $uri->as_string;