Skip Menu |

This queue is for tickets about the Lingua-EN-NameParse CPAN distribution.

Report information
The Basics
Id: 109786
Status: resolved
Priority: 0/
Queue: Lingua-EN-NameParse

People
Owner: kimryan [...] cpan.org
Requestors: NHORNE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.33
Fixed in: (no value)



Subject: Two Middle Names
Often people have two (or more) middle names. L:E:N fails to parse them correctly: $ cat parse #!/usr/bin/env perl use strict; use warnings; use Lingua::EN::NameParse; use Data::Dumper; my $nameparser = Lingua::EN::NameParse->new(extended_titles => 1); $nameparser->parse('Matthew Mark John Smith'); my %comps = $nameparser->components(); print Dumper(\%comps); $ ./parse $VAR1 = { 'initials_2' => '', 'title_1' => '', 'surname_2' => '', 'surname_1' => 'John', 'given_name_1' => 'Matthew', 'title_2' => '', 'precursor' => '', 'conjunction_2' => '', 'conjunction_1' => '', 'suffix' => '', 'non_matching' => 'Smith', 'middle_name' => 'Mark', 'initials_1' => '', 'given_name_2' => '' };
If documentation at the top of the module shows all the name patterns this module can recognise, such as Mr_John_Adam_Smith, Mr_John_A_Smith etc. The pattern you require, Matthew_Mark_John_Smith has very little context in it, such as limited set of patterns for titles or initials. So parsing 2 middle names is much more difficult than parsing just one, and I think beyond the capabilities of this module. I could add in this pattern, but it would see a much higher error rate for data that does not contain this pattern, which is fairly uncommon in most commercial usage. As a work around, I suggest you could detect 4 word patterns, remove the last word and then substitute the removed word back in after parsing. So after parsing, surname would become second middle name and you then add the removed surname back in.
This request is outside the scope of this module to solve. For people with 2 middle names and no titles, the parser just sees 4 words. Not enough context for it to identify name components.