Skip Menu |

This queue is for tickets about the Lingua-Han-Utils CPAN distribution.

Report information
The Basics
Id: 98923
Status: resolved
Priority: 0/
Queue: Lingua-Han-Utils

People
Owner: Nobody in particular
Requestors: GUGOD [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Hi, The subroutine Unihan_value() always assumes its argument to contain raw bytes, and guess the character encoding from there. It is a lot of redundant work, while we can also just let it takes character strings in the beginning -- particularly the case when we are doing bulk-processing with data loaded from databases that are already decoded into characters. The patch is as simple as this: ---------- --- Utils.pm.orig 2014-09-16 11:59:25.000000000 +0200 +++ Utils.pm 2014-09-16 12:07:31.000000000 +0200 @@ -20,7 +20,7 @@ sub Unihan_value { my $word = shift; - $word = cdecode($word); + $word = cdecode($word) unless Encode::is_utf8($word); my @unihan = map { uc sprintf("%x",$_) } unpack ("U*", $word); return wantarray?@unihan:(join('', @unihan)); } ----------
hi new version shipped. thanks for the patching.