Subject: | to_utf8 cannot convert 1 byte characters from Big5 |
I am running
Perl v5.8.2 built for i686-linux on Redhat 9.0
I am trying convert Big5 characters to utf8 using the to_utf8 function in Unicode-MapUTF8. This seems to work fine for 2 byte characters. If I try to pass a 1 byte character like the digit 0 or 1 or 2 it does not return a utf8 character. I believe that some characters in Big5 are represented as 1 bytes.
### FOR EXAMPLE
# string consisting of three Big5 characters 0xA540, 0xA541, 0x30
$STR = "\xA5\x40\xA5\x41\x30";
$NEW_STR = to_utf8({ -string=>$STR,-charset=>'Big5'});
# The above returns the utf8 representations of only the first two chinese # characters, but fails to convert the third.
---------------
From
http://www.fifi.org/cgi-bin/man2html/usr/share/man/man7/charsets.7.gz
Big5 is a popular character set in Taiwan to express traditional Chinese. (Big5 is both a character set and an encoding.) It is a superset of US ASCII. Non-ASCII characters are expressed in two bytes. Bytes 0xa1-0xfe are used as leading bytes for two-byte characters. Big5 and its extension is widely used in Taiwan and Hong Kong. It is not ISO 2022-compliant.