Skip Menu |

This queue is for tickets about the Lingua-Stem-Snowball CPAN distribution.

Report information
The Basics
Id: 13898
Status: resolved
Priority: 0/
Queue: Lingua-Stem-Snowball

People
Owner: Nobody in particular
Requestors: marvin [...] rectangular.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.92
Fixed in: (no value)



Subject: apostrophe s in English stemmer
In Lingua::Stem::Snowball version 0.92, English words which end in apostrophe s, such as "ranger's" lose the s but keep the apostrophe. This requires a wasteful preprocessing pass on text to be stemmed to strip all apostrophe-s instances with... s/'s$//; It also means that if you need to use the unmodified stemmable text for some other purpose, you must make a copy of the entire array. These problems have workarounds, albeit expensive ones. However, they require that the user be aware in the first place of the bizarre behavior of the stemmer. No one expects a user to enter "ranger'" into a search box. And although the Lingua::Stem module has its own quirks (e.g. deletion of any tokens containing digits), it handles the apostrophe-s as you would expect. The preferred solution would be to change the behavior of the stemmer. If that is not possible, the documentation should inform the user that they must strip apostrophe-s themselves. Here is a program which demonstrates the behavior. #!/usr/bin/perl use strict; use warnings; use Lingua::Stem::Snowball; my $snowball = Lingua::Stem::Snowball->new( lang => 'en' ); my @stemmable = ( 'foo', "ranger's", 'bar' ); my @stemmed = $snowball->stem(\@stemmable); print "Snowball: @stemmed\n";