Skip Menu |

This queue is for tickets about the MARC-Record CPAN distribution.

Report information
The Basics
Id: 3707
Status: resolved
Priority: 0/
Queue: MARC-Record

People
Owner: esummers [...] cpan.org
Requestors: esummers [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 2.0.0



Subject: utf8 in MARC record not handled properly
Position 9 of the MARC leader allows you to define Unicode as the character set being used in the record. Vendors such as OCLC are moving towards UTF-8 rather than MARC-8 for character representation. Currently MARC::Record uses length() to calculate directory offsets, and substr() to extract fields from the record based on the directory offsets. This works fine for MARC-8 character encodings, but breaks once a character can be more than one byte. A TODO test has been added to the test suite which illustrates (utf8.t). On the positive side, Jarkko indicates that 5.8.1 will have bytes::substr() to complement 5.8.0's bytes::length(). Appropriate use of these will be able to ensure MARC::Record can handle utf8 in MARC data. But it will break backwards compatability. Perhaps a patch for a utf8 safe MARC::Record distro will be the way to go. -- From: Jarkko Hietaniemi <jhi@iki.fi> To: ed-perluni@inkdroid.org, perl-unicode@perl.org Subject: Re: bytes::substr() ? Perl 5.8.1, whenever that happens, will have bytes::substr(). -- Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
Resolve in version 2.0.0 Now requires perl 5.8.2