Bug #124066 for Lingua-Stem-Snowball: Stemmer/input encoding mismatches

Subject:

Stemmer/input encoding mismatches

Lingua::Stem::Snowball currently always uses the stemmer corresponding to the supplied `encoding` argument regardless of the value of the `SVf_UTF8` flag on scalars. This is incorrect: at the very least, a UTF-8 stemmer should always be used for `SVf_UTF8` scalars. This bug typically manifests as a failure to stem words which ought to be stemmed, with more words affected the further the language is away from ASCII. Fixing it probably justifies a major version increment.