Subject: | double diacritics |
Date: | Tue, 2 Sep 2008 09:00:16 -0400 |
To: | bug-MARC-Charset [...] rt.cpan.org |
From: | "Thomas P. Ventimiglia" <tventimi [...] princeton.edu> |
Greetings:
MARC-Charset is a great package, but I recently noticed a small
problem regarding the conversion of diacritics that span two
characters. In MARC8, there are two of these, the ligature and double
tilde. Each of these is implemented as a pair of combing diacritcs, a
"left half" and "right half" (0xEB and 0xEC for the ligature, 0xFA and
0xFB for the tilde). There are two different ways of converting these
to Unicode. They may be converted directly to the combining half
marks 0xFE20...0xFE23, or the two half marks may be replaced with one
of the "double" diactrics, which is placed between the two characters
it spans (0x0361 for ligature, 0x0360 for tilde). However,
MARC-Charset does not do either of these. Instead, it replaces the
left half with the double diacritic mark, and the right half with the
Unicode right half mark.
I am using version 1.0 of the module with Perl 5.8.8 on Red Hat
Enterprise Linux 2.6.18-92.1.10.el5.
Thank you for your help.
---
Thomas Ventimiglia
Computer Systems Specialist
Princeton University East Asian Library