Skip Menu |

This queue is for tickets about the MARC-Record CPAN distribution.

Report information
The Basics
Id: 32332
Status: new
Priority: 0/
Queue: MARC-Record

People
Owner: Nobody in particular
Requestors: thienho [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.0.0
Fixed in: (no value)



Subject: marc_to_utf8
MARC::File::USMARC->decode() calls marc_to_utf8, which in turn calls decode(). Converting a string from UTF-8 to UTF-8 using decode() probably returns an invalid UTF-8 string. Secondly, if character encoding is UTF-8, its data should be UTF-8 string. Conversion from MARC-8 to UTF-8 should not be done in MARC::File::USMARC->decode(). Therefore, marc_to_utf8() should be removed from MARC::File::USMARC->decode(). Attached are my patch and a sample UTF-8 record. Regards, Thien Ho
Subject: r-2
Download r-2
application/octet-stream 905b

Message body not shown because it is not plain text.

Subject: USMARC.pm.diff
--- lib/MARC/File/USMARC.pm.orig 2005-04-22 17:11:04.000000000 -0400 +++ lib/MARC/File/USMARC.pm 2008-01-14 14:59:45.000000000 -0500 @@ -166,10 +166,13 @@ my $tagdata = bytes::substr( $text, $data_start+$offset, $len ); - # if utf8 the we encode the string as utf8 - if ( $marc->encoding() eq 'UTF-8' ) { - $tagdata = marc_to_utf8( $tagdata ); - } +# Mon, Jan 14, 2008 @ 14:52:40 EST +# Thien Ho <thienho@gmail.com> +# marc_to_utf8 calls decode() to convert UTF-8 string. That could return invalid UTF-8 characters. +# # if utf8 the we encode the string as utf8 +# if ( $marc->encoding() eq 'UTF-8' ) { +# $tagdata = marc_to_utf8( $tagdata ); +# } $marc->_warn( "Invalid length in directory for tag $tagno $location" ) unless ( $len == bytes::length($tagdata) );