Subject: | Ascii chars dropped when converting to BIG5 |
Thanks for creating this module, it's very useful!
Seems like Unicode-MapUTF8's from_utf8() function gets confused though
if there's leading ascii characters in the string, which are dropped in
the result.
Here's an example, using Encode's from_to() function to show that Encode
handles the conversion fine while Unicode-MapUTF8 drops the leading
ascii chars:
./maputf8.pl
orig_str = [Foobar Hongkong Limited ª©Åv©Ò¦³ ¤£±oÂà¸ü ]
from big5 to utf8 = [Foobar Hongkong Limited çæ¬ææ ä¸å¾è½è¼ ]
from utf8 to big5 = [Foobar Hongkong Limited ª©Åv©Ò¦³ ¤£±oÂà¸ü ]
big5 str converted by MapUTF8 = [ª©Åv©Ò¦³¤£±oÂà¸ü ]
Would be great if you could look into it, thanks!
Mike Schilli
m@perlmeister.com
Subject: | maputf8.pl |
#!/usr/local/bin/perl -w
use 5.008;
use strict;
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
use Encode qw(from_to);
my $big5_str = "Foobar Hongkong Limited \xaa\xa9\xc5v\xa9\xd2\xa6\xb3 \xa4\xa3\xb1o\xc2\xe0\xb8\xfc";
my $encoding = "big5";
print "orig_str = [$big5_str ]\n";
my $big5_str_bk = $big5_str;
my $b2u = from_to($big5_str_bk, "big5", "utf8");
print "from big5 to utf8 = [$big5_str_bk ]\n";
my $utf8_str = $big5_str_bk;
my $u2b = from_to($big5_str_bk, "utf8", "big5");
print "from utf8 to big5 = [$big5_str_bk ]\n";
my $conv_str = from_utf8({ -string =>$utf8_str, -charset => $encoding});
print "big5 str converted by MapUTF8 = [$conv_str ]\n";