CC: | dbw2 [...] calvin.edu |
Subject: | LC Callnumber normalization for sorting incorrect |
Date: | Wed, 7 Jan 2015 10:22:42 -0500 |
To: | bug-Library-CallNumber-LC [...] rt.cpan.org |
From: | Kyle Hall <kyle [...] bywatersolutions.com> |
It appears that LC Callnumber sort normalization is not quite correct.
The issue is that cutter numbers are sorted decimally, and as such we need
to pad out the numbers with 0's to make them sort correctly.
For example: take this two callnumbers that have been normalized:
PS3561.I4 A3 => PS3561 I4 A3 PS3561.I48 O5 => PS3561 I48 O5
These will sort incorrectly, as the second callnumber will be first for an
alphanumeric sort, but should be last when sorted by LCC rules.
They should be: PS3561.I4 A3 => PS3561 I40 A3 PS3561.I48 O5 => PS3561 I48 O5
in order to preserve the correct sort order. In fact, we will need to pad
out both cutters since they are actually decimal numbers ( .4 and .48
respectively ). Is there any reasonable limit to the length of a cutter
number? That is, what's the longest one you've seen ever? Here's the same
callnumbers with the cutters padded to 10 digits: PS3561.I4 A3 => PS3561
I4000000000 A3 PS3561.I48 O5 => PS3561 I4800000000 O5
I cannot imagine anything going beyond 10 digits.
The following diff fixes the issue ( I believe ):
--- /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm 2015-01-07
10:21:20.000000000 -0500
+++ /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm.new 2015-01-07
10:21:06.000000000 -0500
@@ -275,6 +275,11 @@
my ($alpha, $num, $dec, $othernum, $c1dec, $c1alpha, $c1num, $c2alpha,
$c2num, $c3alpha, $c3num, $extra) = ($1, $2, $3, $4, $5, $6, $7, $8, $9,
$10, $11, $12);
+ # cutter numbers are decimal sorted, pad out to 10 digits
+ $c1num .= '0' x ( 10 - length $c1num ) if $c1num;
+ $c2num .= '0' x ( 10 - length $c2num ) if $c2num;
+ $c3num .= '0' x ( 10 - length $c3num ) if $c3num;
+
no warnings;
my $class = $alpha;
$class .= sprintf('%04s', $num) if $num;