Skip Menu |

This queue is for tickets about the Lingua-EN-AddressParse CPAN distribution.

Report information
The Basics
Id: 127814
Status: resolved
Priority: 0/
Queue: Lingua-EN-AddressParse

People
Owner: Nobody in particular
Requestors: NHORNE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.27
Fixed in: (no value)



Subject: Parsing of UK towns is slow
Running some timing tests I've found that US and Canadian towns (not full addresses) is really quick - "Saskatoon, Saskatchewan, Canada" takes 0.1s, but of UK towns is slow - "Ramsgate, Kent, England" takes 1.07s. I realise that this package is about parsing addresses not towns, but do you have any pointers?
To make it clear, the options I'm giving to new are: my $ap = Lingua::EN::AddressParse->new(country => 'GB', auto_clean => 1, force_case => 1, force_post_code => 0);
Oh, and I just noticed something, the thing that's taking a long time isn't the call to parse(), it's actually the call to new() that takes a long time for GB.
Yes, the call to new is slow. But you should only need to do this once, and reuse the returned object for every address you parse. And never make the call to new from inside a loop.
What I don't understand is why it's so much slower on GB data than the US and Canada.
I did a dump of the recurse-descent tree that is created. The UK one is twice as big as the US one, about 1.3 MB. Seems like the longer and more complex list of sub coutries in the UK causes the slowness compared to USA, Aust etc. It's really an issue for Parse::RecDescent module. But I think the setup time is reasonable for the complexity of the grammar.
Sorry, I've misread this so far, can see issue is with paring speed not setup time. I ran a test on some UK addresses and it processed 1000 records per hour. Are your addresses parsing without errors? The default pattern is to have a street number.
Yes it does parse OK. I've re-factored my code to encapsulated parsers in my class (Geo::Coder::Free).
Can you give a few more data examples? I can only get UK addresses to parse with a street name and number.
On Tue Nov 27 17:27:49 2018, KIMRYAN wrote: Show quoted text
> Can you give a few more data examples? I can only get UK addresses to > parse with a street name and number.
OK, so actually the parses fail so that's a read herring. I think you've answered what I need to know, which is all about instantiating objects and I now have a way forward. Thank you.