Subject: | Incorrect record/field lengths for records with UTF-8 characters |
I am using MARC::Record together with MARC::Charset to convert data to the UTF-8 character set. The record written out by MARC::Record, however, has an incorrect length in the leader for the record and incorrect lengths in the directory for any field with UTF-8 characters. A comparison of the records with MARC-8 data and UTF-8 data show that the leader and directory are exactly the same. Thus, it appears that MARC::Record is counting *characters*, not *bytes* in the record output. (I am currently working only with records having ASCII/ANSEL data, so there is a direct one-to-one correspondence between the characters in the two records.) The latest version of marcdump (from MARC::Record 1.20) does allow the record to be printed, but with some fields truncated because of the incorrect lengths.
Richard A. Lammert
Technical Services Librarian
Concordia Theological Seminary
6600 N. Clinton St.
Fort Wayne, IN 46825-4998