Skip Menu |

This queue is for tickets about the Text-Unaccent-PurePerl CPAN distribution.

Report information
The Basics
Id: 81320
Status: open
Priority: 0/
Queue: Text-Unaccent-PurePerl

People
Owner: Nobody in particular
Requestors: epierre [...] e-nef.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: utf8 modules disables unac_string
Date: Wed, 21 Nov 2012 11:31:02 +0100
To: bug-text-unaccent-pureperl [...] rt.cpan.org
From: Emmanuel PIERRE <epierre [...] e-nef.com>
Hello, I'he reinstalles my PC witth a working Mysql5 -> DBI -> HTML::Templates in UTF-8 to a new configuration from debian Wheezy perl version : 5.14.2-15 dbd mysql : 4.021-1+b1 html::template: 2.92 mysql: 5.5.28 My initial scripts used unac_strings to remove accents but since I had to introduce use utf8 to have correct encoding management from the database, unac_strings doesn't work anymore, it introduced systematically some long caracters for every utf8 specific caracters. I've replaced it with Text::Unidecode and unidecode that works fine.
Hello I am sorry, but I don't quite understand what you mean. What is a "long character"? And what is a "utf8 specific character"? Do you mean a non-ASCII character? I have tested the module again with some more input, but it works exactly as intended. Please provide a concrete example showing what goes wrong. Please include your input and the faulty output. If Text::Unidecode solves your problems, that's fine, but keep in mind that Text::Unaccent::PurePerl and Text::Unidecode do different things. The former primarily removes accents and other diacritic marks, whereas the latter attempts to do a full transliteration to ASCII. Here are some examples showing the differences: "Русский" (input) "Русскии" (output from Text::Unaccent::PurePerl::unac_string) "Russkii" (output from Text::Unidecode::unidecode) "Ελληνικά" (input) "Ελληνικα" (output from Text::Unaccent::PurePerl::unac_string) "Ellinika" (output from Text::Unidecode::unidecode)