Subject: | Stemmer/input encoding mismatches |
Lingua::Stem::Snowball currently always uses the stemmer corresponding to the supplied `encoding` argument regardless of the value of the `SVf_UTF8` flag on scalars. This is incorrect: at the very least, a UTF-8 stemmer should always be used for `SVf_UTF8` scalars.
This bug typically manifests as a failure to stem words which ought to be stemmed, with more words affected the further the language is away from ASCII. Fixing it probably justifies a major version increment.