Skip Menu |

This queue is for tickets about the Lingua-Stem-Snowball CPAN distribution.

Report information
The Basics
Id: 124066
Status: new
Priority: 0/
Queue: Lingua-Stem-Snowball

People
Owner: CREAMYG [...] cpan.org
Requestors: CREAMYG [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 0.95
  • 0.951
  • 0.952
Fixed in: (no value)



Subject: Stemmer/input encoding mismatches
Lingua::Stem::Snowball currently always uses the stemmer corresponding to the supplied `encoding` argument regardless of the value of the `SVf_UTF8` flag on scalars. This is incorrect: at the very least, a UTF-8 stemmer should always be used for `SVf_UTF8` scalars. This bug typically manifests as a failure to stem words which ought to be stemmed, with more words affected the further the language is away from ASCII. Fixing it probably justifies a major version increment.