Skip Menu |

This queue is for tickets about the MARC-Charset CPAN distribution.

Report information
The Basics
Id: 38912
Status: resolved
Priority: 0/
Queue: MARC-Charset

People
Owner: GMCHARLT [...] cpan.org
Requestors: tventimi [...] princeton.edu
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.33
Fixed in: 1.34



Subject: double diacritics
Date: Tue, 2 Sep 2008 09:00:16 -0400
To: bug-MARC-Charset [...] rt.cpan.org
From: "Thomas P. Ventimiglia" <tventimi [...] princeton.edu>
Greetings: MARC-Charset is a great package, but I recently noticed a small problem regarding the conversion of diacritics that span two characters. In MARC8, there are two of these, the ligature and double tilde. Each of these is implemented as a pair of combing diacritcs, a "left half" and "right half" (0xEB and 0xEC for the ligature, 0xFA and 0xFB for the tilde). There are two different ways of converting these to Unicode. They may be converted directly to the combining half marks 0xFE20...0xFE23, or the two half marks may be replaced with one of the "double" diactrics, which is placed between the two characters it spans (0x0361 for ligature, 0x0360 for tilde). However, MARC-Charset does not do either of these. Instead, it replaces the left half with the double diacritic mark, and the right half with the Unicode right half mark. I am using version 1.0 of the module with Perl 5.8.8 on Red Hat Enterprise Linux 2.6.18-92.1.10.el5. Thank you for your help. --- Thomas Ventimiglia Computer Systems Specialist Princeton University East Asian Library
Subject: [rt.cpan.org #38912] double diacritics
Date: Tue, 2 Sep 2008 09:06:30 -0400
To: bug-MARC-Charset [...] rt.cpan.org
From: "Thomas P. Ventimiglia" <tventimi [...] princeton.edu>
Please see the attached files. doublediacritics.txt is a MARC8-encoded file containing the two diacritics in question. doubleresult-marccharset.txt is the UTF8-conversion produced by MARC-Charset, and doubleresult-correct.txt is the correct UTF8-conversion. Tom On Tue, Sep 2, 2008 at 9:00 AM, Bugs in MARC-Charset via RT <bug-MARC-Charset@rt.cpan.org> wrote: Show quoted text
> > Greetings, > > This message has been automatically generated in response to the > creation of a trouble ticket regarding: > "double diacritics", > a summary of which appears below. > > There is no need to reply to this message right now. Your ticket has been > assigned an ID of [rt.cpan.org #38912]. Your ticket is accessible > on the web at: > > http://rt.cpan.org/Ticket/Display.html?id=38912 > > Please include the string: > > [rt.cpan.org #38912] > > in the subject line of all future correspondence about this issue. To do so, > you may reply to this message. > > Thank you, > bug-MARC-Charset@rt.cpan.org > > ------------------------------------------------------------------------- > Greetings: > > MARC-Charset is a great package, but I recently noticed a small > problem regarding the conversion of diacritics that span two > characters. In MARC8, there are two of these, the ligature and double > tilde. Each of these is implemented as a pair of combing diacritcs, a > "left half" and "right half" (0xEB and 0xEC for the ligature, 0xFA and > 0xFB for the tilde). There are two different ways of converting these > to Unicode. They may be converted directly to the combining half > marks 0xFE20...0xFE23, or the two half marks may be replaced with one > of the "double" diactrics, which is placed between the two characters > it spans (0x0361 for ligature, 0x0360 for tilde). However, > MARC-Charset does not do either of these. Instead, it replaces the > left half with the double diacritic mark, and the right half with the > Unicode right half mark. > > I am using version 1.0 of the module with Perl 5.8.8 on Red Hat > Enterprise Linux 2.6.18-92.1.10.el5. > > Thank you for your help. > > --- > Thomas Ventimiglia > Computer Systems Specialist > Princeton University East Asian Library > >

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Thank you for the bug report. This is fixed in the rt38912 branch of the MARC/Perl Git repository (clone git://marcpm.git.sourceforge.net/gitroot/marcpm/marcpm, gitweb http://marcpm.git.sourceforge.net/git/gitweb.cgi? p=marcpm/marcpm;a=shortlog;h=refs/heads/rt38912) if you care to test. I will be making a new MARC::Charset release in the next day or two.
Subject: Re: [rt.cpan.org #38912] double diacritics
Date: Mon, 8 Aug 2011 08:32:19 -0400
To: bug-MARC-Charset [...] rt.cpan.org
From: "Thomas P. Ventimiglia" <tventimi [...] princeton.edu>
Thank you. Tom On Sat, Aug 6, 2011 at 4:22 PM, Galen Charlton via RT <bug-MARC-Charset@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=38912 > > > Thank you for the bug report.  This is fixed in the rt38912 branch of the > MARC/Perl Git repository (clone > git://marcpm.git.sourceforge.net/gitroot/marcpm/marcpm, gitweb > http://marcpm.git.sourceforge.net/git/gitweb.cgi? > p=marcpm/marcpm;a=shortlog;h=refs/heads/rt38912) if you care to test.  I > will be making a new MARC::Charset release in the next day or two. >
On Sat Aug 06 16:22:04 2011, GMCHARLT wrote: Show quoted text
> Thank you for the bug report. This is fixed in the rt38912 branch of the > MARC/Perl Git repository (clone > git://marcpm.git.sourceforge.net/gitroot/marcpm/marcpm, gitweb > http://marcpm.git.sourceforge.net/git/gitweb.cgi? > p=marcpm/marcpm;a=shortlog;h=refs/heads/rt38912) if you care to test. I > will be making a new MARC::Charset release in the next day or two.
The fix was released in version 1.34.