Bug #3707 for MARC-Record: utf8 in MARC record not handled properly

Wed Sep 03 10:07:36 2003 esummers [...] cpan.org - Ticket created

Subject:

utf8 in MARC record not handled properly

Position 9 of the MARC leader allows you to define Unicode as the character set being used in the record. Vendors such as OCLC are moving towards UTF-8 rather than MARC-8 for character representation. Currently MARC::Record uses length() to calculate directory offsets, and substr() to extract fields from the record based on the directory offsets. This works fine for MARC-8 character encodings, but breaks once a character can be more than one byte. A TODO test has been added to the test suite which illustrates (utf8.t). On the positive side, Jarkko indicates that 5.8.1 will have bytes::substr() to complement 5.8.0's bytes::length(). Appropriate use of these will be able to ensure MARC::Record can handle utf8 in MARC data. But it will break backwards compatability. Perhaps a patch for a utf8 safe MARC::Record distro will be the way to go. -- From: Jarkko Hietaniemi <jhi@iki.fi> To: ed-perluni@inkdroid.org, perl-unicode@perl.org Subject: Re: bytes::substr() ? Perl 5.8.1, whenever that happens, will have bytes::substr(). -- Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

Tue Nov 04 10:16:29 2003 esummers [...] cpan.org - Taken

Tue Nov 04 10:16:32 2003 esummers [...] cpan.org - Status changed from 'new' to 'open'

Tue Nov 04 10:18:20 2003 esummers [...] cpan.org - Status changed from 'open' to 'new'

Thu Feb 01 15:36:47 2007 bricas [...] cpan.org - Correspondence added

Resolve in version 2.0.0 Now requires perl 5.8.2

Thu Feb 01 15:36:48 2007 The RT System itself - Status changed from 'new' to 'open'

Thu Feb 01 15:36:49 2007 bricas [...] cpan.org - Status changed from 'open' to 'resolved'

Thu Feb 01 15:37:09 2007 bricas [...] cpan.org - Fixed in 2.0.0 added