Skip Menu |

This queue is for tickets about the Lingua-StopWords CPAN distribution.

Report information
The Basics
Id: 52330
Status: open
Priority: 0/
Queue: Lingua-StopWords

People
Owner: Nobody in particular
Requestors: tburtonw [...] umich.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Lingua-StopWords does not produce utf8 for russian when utf8 argument is set
Date: Tue, 1 Dec 2009 14:52:39 -0500
To: "bug-Lingua-StopWords [...] rt.cpan.org" <bug-Lingua-StopWords [...] rt.cpan.org>
From: "Burton-West, Tom" <tburtonw [...] umich.edu>
Hello, I am using Lingua::StopWords 0.9 with perl 5.8.8. When I give the utf-8 argument to getStopWords, I do not get correct utf8 out. It seems to ignore the utf-8 argument. use Lingua::StopWords qw( getStopWords ); my $stopwords = {}; $stopwords = getStopWords('ru', 'UTF-8'); my @words = keys %{$stopwords}; binmode STDOUT, ":utf8"; foreach my $word (@words) { print "$word\n"; } If I run the above program without setting STDOUT to utf8, I can verify that I am getting the koi8-r encoding whether or not the 'UTF-8" argument is included in the call to getStopWords. Tom Burton-West tburtonw@umich.edu<mailto:tburtonw@umich.edu>
Subject: [rt.cpan.org #52330] How to fix
Date: Mon, 27 Mar 2017 18:27:12 +0300
To: bug-Lingua-StopWords [...] rt.cpan.org
From: Ivan Krylov <krylov.r00t [...] gmail.com>
Lines 17-39 of Lingua/StopWords/RU.pm need to be re-encoded in two steps: 1) utf-8 -> latin1 2) koi8-r -> utf-8 This transformation fixes the file and stores the correct UTF-8 representation of Cyrillic characters in the file. -- Best regards, Ivan