Skip Menu |

This queue is for tickets about the Geo-StreetAddress-US CPAN distribution.

Report information
The Basics
Id: 101688
Status: open
Priority: 0/
Queue: Geo-StreetAddress-US

People
Owner: Nobody in particular
Requestors: MWELLS [...] cpan.org
Cc: mark [...] freeside.biz
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 1.03
  • 1.04
Fixed in: (no value)



CC: mark [...] freeside.biz
Subject: Fractions in house numbers are discarded, fractions in street names are mangled
Supposedly, fractional house numbers are supported, but they're discarded, as $Addr_Match{fraction} doesn't set anything in %_. This might be intentional behavior (it matches what's in the tests) but it's surprising to have the fraction just disappear. More troublesome are fractional street names, like "Avenue 6 1/2". These match the regexes, but then normalize_address deletes the '/'. I can work around the first of these by tweaking the %AddrMatch patterns, but the normalize_address pattern isn't in a global, and there's no way to tell it _not_ to normalize the address elements after parsing, either.
Patch for both issues.
Subject: 1.04.diff
diff -u -r -x Changes Geo-StreetAddress-US-1.04/t/01_parser.t Geo-StreetAddress-US-1.04a/t/01_parser.t --- Geo-StreetAddress-US-1.04/t/01_parser.t 2014-03-04 06:27:52.000000000 -0800 +++ Geo-StreetAddress-US-1.04a/t/01_parser.t 2015-01-23 09:58:36.868423937 -0800 @@ -209,13 +209,21 @@ 'type' => 'Ave', 'prefix' => 'SE' }, - "3813 1/2 Some Road, Los Angeles, CA" => { - 'number' => '3813', + "3813 1/2 Some Road, Los Angeles, CA" => { # fractional house number + 'number' => '3813 1/2', 'street' => 'Some', 'state' => 'CA', 'city' => 'Los Angeles', 'type' => 'Rd', }, + "9001 Avenue 8 1/2, Madera, California" => { # fractional street number + 'number' => '9001', + 'street' => 'Avenue 8 1/2', + 'state' => 'CA', + 'city' => 'Madera', + 'type' => '', + }, + "Mission & Valencia San Francisco CA" => { 'type1' => '', 'type2' => '', diff -u -r -x Changes Geo-StreetAddress-US-1.04/US.pm Geo-StreetAddress-US-1.04a/US.pm --- Geo-StreetAddress-US-1.04/US.pm 2014-03-04 07:32:20.000000000 -0800 +++ Geo-StreetAddress-US-1.04a/US.pm 2015-01-23 10:00:26.316426868 -0800 @@ -792,7 +792,8 @@ # treat "42S" as "42 S" (42 South). For example, # Utah and Wisconsin have a more elaborate system of block numbering # http://en.wikipedia.org/wiki/House_number#Block_numbers - $Addr_Match{number} = qr/(\d+-?\d*)(?=\D) (?{ $_{number} = $^N })/ix, + $Addr_Match{number} = qr/(\d+-?\d*)(?=\D (?:\s+$Addr_Match{fraction})?) + (?{ $_{number} = $^N })/ix, # note that expressions like [^,]+ may scan more than you expect $Addr_Match{street} = qr/ @@ -1049,7 +1050,7 @@ #m/^_/ and delete $part->{$_} for keys %$part; # for debug # strip off some punctuation - defined($_) && s/^\s+|\s+$|[^\w\s\-\#\&]//gos for values %$part; + defined($_) && s/^\s+|\s+$|[^\w\s\-\/\#\&]//gos for values %$part; while (my ($key, $map) = each %Normalize_Map) { $part->{$key} = $map->{lc $part->{$key}}
On Thu Jan 22 23:06:02 2015, MWELLS wrote: Show quoted text
> Supposedly, fractional house numbers are supported, but they're > discarded, as $Addr_Match{fraction} doesn't set anything in %_. This > might be intentional behavior (it matches what's in the tests) but > it's surprising to have the fraction just disappear. > > More troublesome are fractional street names, like "Avenue 6 1/2". > These match the regexes, but then normalize_address deletes the '/'. > > I can work around the first of these by tweaking the %AddrMatch > patterns, but the normalize_address pattern isn't in a global, and > there's no way to tell it _not_ to normalize the address elements > after parsing, either.
Hi, this doesn't relate directly to this module. I maintain Lingua::EN::AddressParse, which does a similar job. I am always looking for new data patterns. Have never seen "Avenue 6 1/2" before, but I think it's a pattern I can account for. My module currently handles the case of "3813 1/2 Some Road, Los Angeles, CA"
I have released a new version of Lingua::EN::AddressParse that handles this case On Tue Apr 21 03:36:37 2015, KIMRYAN wrote: Show quoted text
> On Thu Jan 22 23:06:02 2015, MWELLS wrote:
> > Supposedly, fractional house numbers are supported, but they're > > discarded, as $Addr_Match{fraction} doesn't set anything in %_. This > > might be intentional behavior (it matches what's in the tests) but > > it's surprising to have the fraction just disappear. > > > > More troublesome are fractional street names, like "Avenue 6 1/2". > > These match the regexes, but then normalize_address deletes the '/'. > > > > I can work around the first of these by tweaking the %AddrMatch > > patterns, but the normalize_address pattern isn't in a global, and > > there's no way to tell it _not_ to normalize the address elements > > after parsing, either.
> > Hi, this doesn't relate directly to this module. I maintain > Lingua::EN::AddressParse, which does a similar job. I am always > looking for new data patterns. Have never seen "Avenue 6 1/2" before, > but I think it's a pattern I can account for. My module currently > handles the case of "3813 1/2 Some Road, Los Angeles, CA"