Skip Menu |

This queue is for tickets about the Library-CallNumber-LC CPAN distribution.

Report information
The Basics
Id: 101376
Status: rejected
Priority: 0/
Queue: Library-CallNumber-LC

People
Owner: Nobody in particular
Requestors: kyle [...] bywatersolutions.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: dbw2 [...] calvin.edu
Subject: LC Callnumber normalization for sorting incorrect
Date: Wed, 7 Jan 2015 10:22:42 -0500
To: bug-Library-CallNumber-LC [...] rt.cpan.org
From: Kyle Hall <kyle [...] bywatersolutions.com>
It appears that LC Callnumber sort normalization is not quite correct. The issue is that cutter numbers are sorted decimally, and as such we need to pad out the numbers with 0's to make them sort correctly. For example: take this two callnumbers that have been normalized: PS3561.I4 A3 => PS3561 I4 A3 PS3561.I48 O5 => PS3561 I48 O5 These will sort incorrectly, as the second callnumber will be first for an alphanumeric sort, but should be last when sorted by LCC rules. They should be: PS3561.I4 A3 => PS3561 I40 A3 PS3561.I48 O5 => PS3561 I48 O5 in order to preserve the correct sort order. In fact, we will need to pad out both cutters since they are actually decimal numbers ( .4 and .48 respectively ). Is there any reasonable limit to the length of a cutter number? That is, what's the longest one you've seen ever? Here's the same callnumbers with the cutters padded to 10 digits: PS3561.I4 A3 => PS3561 I4000000000 A3 PS3561.I48 O5 => PS3561 I4800000000 O5 I cannot imagine anything going beyond 10 digits. The following diff fixes the issue ( I believe ): --- /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm 2015-01-07 10:21:20.000000000 -0500 +++ /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm.new 2015-01-07 10:21:06.000000000 -0500 @@ -275,6 +275,11 @@ my ($alpha, $num, $dec, $othernum, $c1dec, $c1alpha, $c1num, $c2alpha, $c2num, $c3alpha, $c3num, $extra) = ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12); + # cutter numbers are decimal sorted, pad out to 10 digits + $c1num .= '0' x ( 10 - length $c1num ) if $c1num; + $c2num .= '0' x ( 10 - length $c2num ) if $c2num; + $c3num .= '0' x ( 10 - length $c3num ) if $c3num; + no warnings; my $class = $alpha; $class .= sprintf('%04s', $num) if $num;
Hello Kyle, Thanks for the bug report. The sort keys are designed to sort using standard ascii rules, and if you try it, you will see that the current generated keys actually sort fine, as the "space" comes before the "8": PS3561 I4 A3 PS3561 I48 O5 The module design takes advantage of the fact that, when sorting, "the first difference is all that matters". Each component of the sort key is separated by a space, and in an ascii sort, the space will come before any letter or number. For this reason, there is never a reason to right-pad decimal numbers, because once you hit a space in the string comparison, that is equivalent to an infinite number of zeros (for sorting purposes), as it always goes to the top. If for some strange reason you are on a system which doesn't treat spaces as "less than" letters and numbers, the module let's you set a different global "$topper" variable which will be then used to separate the call number components. Hope this helps, Dan On Wed Jan 07 10:22:51 2015, kyle@bywatersolutions.com wrote: Show quoted text
> It appears that LC Callnumber sort normalization is not quite correct. > > The issue is that cutter numbers are sorted decimally, and as such we need > to pad out the numbers with 0's to make them sort correctly. > > For example: take this two callnumbers that have been normalized: > > PS3561.I4 A3 => PS3561 I4 A3 PS3561.I48 O5 => PS3561 I48 O5 > These will sort incorrectly, as the second callnumber will be first for an > alphanumeric sort, but should be last when sorted by LCC rules. > > They should be: PS3561.I4 A3 => PS3561 I40 A3 PS3561.I48 O5 => PS3561 I48 O5 > in order to preserve the correct sort order. In fact, we will need to pad > out both cutters since they are actually decimal numbers ( .4 and .48 > respectively ). Is there any reasonable limit to the length of a cutter > number? That is, what's the longest one you've seen ever? Here's the same > callnumbers with the cutters padded to 10 digits: PS3561.I4 A3 => PS3561 > I4000000000 A3 PS3561.I48 O5 => PS3561 I4800000000 O5 > > I cannot imagine anything going beyond 10 digits. > > The following diff fixes the issue ( I believe ): > > --- /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm 2015-01-07 > 10:21:20.000000000 -0500 > +++ /usr/local/share/perl/5.10.1/Library/CallNumber/LC.pm.new 2015-01-07 > 10:21:06.000000000 -0500 > @@ -275,6 +275,11 @@ > > my ($alpha, $num, $dec, $othernum, $c1dec, $c1alpha, $c1num, $c2alpha, > $c2num, $c3alpha, $c3num, $extra) = ($1, $2, $3, $4, $5, $6, $7, $8, $9, > $10, $11, $12); > > + # cutter numbers are decimal sorted, pad out to 10 digits > + $c1num .= '0' x ( 10 - length $c1num ) if $c1num; > + $c2num .= '0' x ( 10 - length $c2num ) if $c2num; > + $c3num .= '0' x ( 10 - length $c3num ) if $c3num; > + > no warnings; > my $class = $alpha; > $class .= sprintf('%04s', $num) if $num;