Subject: | Lingua-StopWords does not produce utf8 for russian when utf8 argument is set |
Date: | Tue, 1 Dec 2009 14:52:39 -0500 |
To: | "bug-Lingua-StopWords [...] rt.cpan.org" <bug-Lingua-StopWords [...] rt.cpan.org> |
From: | "Burton-West, Tom" <tburtonw [...] umich.edu> |
Hello,
I am using Lingua::StopWords 0.9 with perl 5.8.8. When I give the utf-8 argument to getStopWords, I do not get correct utf8 out.
It seems to ignore the utf-8 argument.
use Lingua::StopWords qw( getStopWords );
my $stopwords = {};
$stopwords = getStopWords('ru', 'UTF-8');
my @words = keys %{$stopwords};
binmode STDOUT, ":utf8";
foreach my $word (@words)
{
print "$word\n";
}
If I run the above program without setting STDOUT to utf8, I can verify that I am getting the koi8-r encoding whether or not the 'UTF-8" argument is included in the call to getStopWords.
Tom Burton-West
tburtonw@umich.edu<mailto:tburtonw@umich.edu>