Hi
Indeed, it can produce a lot of false negatives.
This will all depend on your corpus, and what you prefer (false positives or false negatives).
In any case, I am no longer managing this module, and I am not sure if Runar Buvik is still interested on doing so *last release five years ago).
My suggestion would be to add one of two options:
- allow the user to specify the common word lexicon (where the user can remove those words from the list)
- create a method to add exceptions.
This should be quite straightforward.
If anyone is willing to provide a patch, I am happy to apply it, and release a new version.
If anyone is willing to adopt the module, I am happy to share that responsibility.
Kindest regards,
ambs
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, July 18, 2020 7:06 AM, amead@alanmead.org via RT <bug-Lingua-EN-NamedEntity@rt.cpan.org> wrote:
Show quoted text> Sat Jul 18 02:06:41 2020: Request 133019 was acted upon.
> Transaction: Ticket created by amead@alanmead.org
> Queue: Lingua-EN-NamedEntity
> Subject: _spurn_dictionary_words() excludes common names like Mark
> Broken in: (no value)
> Severity: (no value)
> Owner: Nobody
> Requestors: amead@alanmead.org
> Status: new
> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=133019 >
>
> As well as May, June, Joy, etc. It also excludes less common names like
> Star, Candy, Hope, etc. This is a significant issue for my use of this
> module.
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Alan D. Mead, Ph.D.
> President, Talent Algorithms Inc.
>
> science + technology = better workers
>
>
http://www.alanmead.org
>
> Courage is resistance to fear, mastery of fear - not absence
> of fear.
>
> -- Mark Twain