Subject: | MARC::Charset bug with Extended Cyrillic charset |
Date: | Tue, 23 Nov 2010 15:58:46 +0200 |
To: | bug-MARC-Charset [...] rt.cpan.org |
From: | Asko Ohmann <asko.ohmann [...] gmail.com> |
Hello,
I've been using MARC::Charset module to do some character conversion
from marc8 to utf8. I found that the Extended Cyrillic characters gave
me an error like:
no mapping found for [0x44] at position 12 in sEMDNOWA
g0=EXTENDED_CYRILLIC g1=EXTENDED_LATIN
at /usr/share/perl5/MARC/Charset.pm line 210.
I got around this by adding 128 to the character value. As I understand
that should be the g1 value however as stated in the error message
Extended Cyrillic is used as g0.
Here is an example of code to reproduce the error:
#!/usr/bin/perl -w
use strict;
use MARC::Charset 'marc8_to_utf8';
my $str =
chr(0x1B).'(NsEM'.chr(0x1B).'(B'.chr(0x1B).'(QD'.chr(0x1B).'(B'.chr(0x1B).'(NNOWA'.chr(0x1B).'(B';
$str = marc8_to_utf8($str);
The string after conversion should read: Семёнова
If it should prove relevant I was running the program on Ubuntu Linux
2.6.35-22-generic #35-Ubuntu and the Perl version was v5.10.1
--
Asko Ohmann