Bug #81320 for Text-Unaccent-PurePerl: utf8 modules disables unac

Wed Nov 21 05:31:33 2012 epierre [...] e-nef.com - Ticket created

Subject:	utf8 modules disables unac_string
Date:	Wed, 21 Nov 2012 11:31:02 +0100
To:	bug-text-unaccent-pureperl [...] rt.cpan.org
From:	Emmanuel PIERRE <epierre [...] e-nef.com>

Hello, I'he reinstalles my PC witth a working Mysql5 -> DBI -> HTML::Templates in UTF-8 to a new configuration from debian Wheezy perl version : 5.14.2-15 dbd mysql : 4.021-1+b1 html::template: 2.92 mysql: 5.5.28 My initial scripts used unac_strings to remove accents but since I had to introduce use utf8 to have correct encoding management from the database, unac_strings doesn't work anymore, it introduced systematically some long caracters for every utf8 specific caracters. I've replaced it with Text::Unidecode and unidecode that works fine.

Sat Mar 02 07:12:59 2013 peter.john.acklam [...] gmail.com - Correspondence added

Hello I am sorry, but I don't quite understand what you mean. What is a "long character"? And what is a "utf8 specific character"? Do you mean a non-ASCII character? I have tested the module again with some more input, but it works exactly as intended. Please provide a concrete example showing what goes wrong. Please include your input and the faulty output. If Text::Unidecode solves your problems, that's fine, but keep in mind that Text::Unaccent::PurePerl and Text::Unidecode do different things. The former primarily removes accents and other diacritic marks, whereas the latter attempts to do a full transliteration to ASCII. Here are some examples showing the differences: "Русский" (input) "Русскии" (output from Text::Unaccent::PurePerl::unac_string) "Russkii" (output from Text::Unidecode::unidecode) "Ελληνικά" (input) "Ελληνικα" (output from Text::Unaccent::PurePerl::unac_string) "Ellinika" (output from Text::Unidecode::unidecode)

Sat Mar 02 07:13:00 2013 The RT System itself - Status changed from 'new' to 'open'

Bug #81320 for Text-Unaccent-PurePerl: utf8 modules disables unac_string