Skip Menu |

This queue is for tickets about the Sort-Maker CPAN distribution.

Report information
The Basics
Id: 53005
Status: new
Priority: 0/
Queue: Sort-Maker

People
Owner: Nobody in particular
Requestors: NCLEATON [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.06
Fixed in: (no value)



Subject: You could allow descending GRT sorts without "fixed" or "varying"
The pod says: NOTE: You cannot sort strings in descending order in the GRT unless the key has either the ’fixed’ or ’varying’ attributes set. That seems wrong. Mathematically there is a symmetry between 0s and 1s and ascending and descending sorts, so there's no algorthmic reason why NULL-free variable length strings shouldn't be sorted descending without pre-characterizing the input to find the max length. Specifically, all you do is append a NULL and then bit flip the whole string (including the NULL) and append that to the sortkey.
On Wed Dec 23 08:47:13 2009, NCLEATON wrote: Show quoted text
> > Specifically, all you do is append a NULL and then bit flip the whole > string (including the NULL) and append that to the sortkey.
This assumes that descending means the reverse order of an ascending sort. If descending means that the alphabet is reversed but a prefix of S should still sort higher than S, you would: Subtract 1 from each byte value (bytes now in range 0..254) Flip bits (bytes now in range 1..255) Append NULL
Patch to implement this.
Subject: patch.txt
diff -Nurd Sort-Maker-0.06.orig/Sort/Maker.pm Sort-Maker-0.06/Sort/Maker.pm --- Sort-Maker-0.06.orig/Sort/Maker.pm 2006-12-29 07:45:36.000000000 +0200 +++ Sort-Maker-0.06/Sort/Maker.pm 2009-12-26 07:55:37.000000000 +0200 @@ -823,6 +823,7 @@ my( $init_code, $pack_format ) ; + my $descend_xor_with = "\$_xor$key_ind"; if ( my $fix_len = $key->{fixed} ) { # create the xor string to invert the key for a descending sort. @@ -856,17 +857,19 @@ } else { -# we can't sort plain (null terminated) strings in descending order - - $@ = <<ERR, return if $key->{descending} ; -make_sorter: A GRT descending string needs to select either the -'fixed' or 'varying' attributes -ERR +# Variable length string - pack NULL terminated, or bit-flipped and \xFF +# terminated if the sort is descending. - $pack_format = 'Z*' ; + if ($key->{descending}) { + $descend_xor_with = '("\xFF"x(length($val)+1))'; + $pack_format = 'a*'; + } + else { + $pack_format = 'Z*' ; + } } - my $descend_code = $key->{descending} ? " . '' ^ \$_xor$key_ind" : '' ; + my $descend_code = $key->{descending} ? " . '' ^ $descend_xor_with" : '' ; my $grt_extract = <<CODE ; do{ my( \$val ) = EXTRACT ; uc( \$val )$descend_code } @@ -1383,14 +1386,13 @@ C<fixed> is a value attribute that marks this string key as always being this length. The extracted value will either be padded with null (0x0) bytes or truncated to the specified length (the value of -C<fixed>). The data in this key may have embedded null bytes (0x0) and -may be sorted in descending order. +C<fixed>). The data in this key may have embedded null bytes (0x0). -C<varying> is a Boolean attribute marks this string key as being of +C<varying> is a Boolean attribute that marks this string key as being of varying lengths. The GRT sorter will do a scan of all of this key's values to find the maximum string length and then it pads all the extracted values to that length. The data in this key may have -embedded null bytes (0x0) and may be sorted in descending order. +embedded null bytes (0x0). =head2 Key Extraction Code @@ -1654,16 +1656,14 @@ =head3 simple string. -If a string key is being sorted in ascending order with the GRT and it -doesn't have one of the GRT string attributes, it will be packed -without any munging and a null (0x0) byte will be appended to it. This -byte enables a shorter string to sort before longer ones that start -with the shorter string. +If a string key in the GRT doesn't have one of the GRT string attributes, +it will be packed without any munging and a null (0x0) byte will be +appended to it. This byte enables a shorter string to sort before longer +ones that start with the shorter string. -NOTE: You cannot sort strings in descending order in the GRT unless -the key has either the 'fixed' or 'varying' attributes set. Also, if a -string is being sorted in ascending order but has any null (0x0) bytes -in it, the key must have one of those attributes set. +NOTE: If your string keys may contain null (0x0) bytes then you must use +either the 'fixed' or 'varying' attribute to correctly sort them with the +GRT. =head3 C<fixed> diff -Nurd Sort-Maker-0.06.orig/t/descending_grt_string.t Sort-Maker-0.06/t/descending_grt_string.t --- Sort-Maker-0.06.orig/t/descending_grt_string.t 2006-12-28 09:28:17.000000000 +0200 +++ Sort-Maker-0.06/t/descending_grt_string.t 2009-12-25 21:36:46.000000000 +0200 @@ -21,6 +21,7 @@ fixed => [ qw( string descending fixed 3 ) ], varying => [ qw( string descending varying ) ], + varlen => [ qw( string descending ) ], }, }, ] ; diff -Nurd Sort-Maker-0.06.orig/t/errors.t Sort-Maker-0.06/t/errors.t --- Sort-Maker-0.06.orig/t/errors.t 2006-12-27 09:51:47.000000000 +0200 +++ Sort-Maker-0.06/t/errors.t 2009-12-25 20:32:34.000000000 +0200 @@ -89,17 +89,6 @@ }, { - name => 'GRT descending string', - styles => [ qw( GRT ) ], - args => { - GRT => [ - qw( string descending ) - ], - }, - error => qr/descending string/, - }, - - { name => 'array args - no value', styles => [ qw( ST ) ], args => { diff -Nurd Sort-Maker-0.06.orig/t/grt_asc_desc_asc.t Sort-Maker-0.06/t/grt_asc_desc_asc.t --- Sort-Maker-0.06.orig/t/grt_asc_desc_asc.t 1970-01-01 02:00:00.000000000 +0200 +++ Sort-Maker-0.06/t/grt_asc_desc_asc.t 2009-12-26 18:12:28.000000000 +0200 @@ -0,0 +1,91 @@ +#!/usr/local/bin/perl -sw + +use strict ; + +use lib 't' ; +use lib '..' ; +require 'common.pm' ; + +my @sort_styles = qw( GRT ) ; + +my @strings = ( + 1, 0, "a\xFF", "\xFFa", 'a', 'aa', 'ab', "\xFF", "\xFF\xFF", '' +); +my @data; +foreach my $i (@strings) { + foreach my $j (@strings) { + foreach my $k (@strings) { + push @data, [$i, $j, $k]; + } + } +} + +my $sort_tests = [ + { + skip => 0, + name => 'ascending descending ascending', + data => [@data], + gold => sub { $a->[0] cmp $b->[0] || + $b->[1] cmp $a->[1] || + $a->[2] cmp $b->[2] }, + args => [ + string => { + code => '$_->[0]', + }, + string => { + code => '$_->[1]', + descending => 1, + }, + string => { + code => '$_->[2]', + }, + ], + }, + { + skip => 0, + name => 'descending ascending descending', + data => [@data], + gold => sub { $b->[0] cmp $a->[0] || + $a->[1] cmp $b->[1] || + $b->[2] cmp $a->[2] }, + args => [ + string => { + code => '$_->[0]', + descending => 1, + }, + string => { + code => '$_->[1]', + }, + string => { + code => '$_->[2]', + descending => 1, + }, + ], + }, + { + skip => 0, + name => 'descending descending descending', + data => [@data], + gold => sub { $b->[0] cmp $a->[0] || + $b->[1] cmp $a->[1] || + $b->[2] cmp $a->[2] }, + args => [ + string => { + code => '$_->[0]', + descending => 1, + }, + string => { + code => '$_->[1]', + descending => 1, + }, + string => { + code => '$_->[2]', + descending => 1, + }, + ], + }, +] ; + +common_driver( $sort_tests, \@sort_styles ) ; + +exit ; diff -Nurd Sort-Maker-0.06.orig/t/GRT.t Sort-Maker-0.06/t/GRT.t --- Sort-Maker-0.06.orig/t/GRT.t 2006-12-28 09:28:17.000000000 +0200 +++ Sort-Maker-0.06/t/GRT.t 2009-12-25 21:39:34.000000000 +0200 @@ -144,6 +144,20 @@ gold => sub { uc $b cmp uc $a }, args => [ qw( string no_case descending varying ) ], }, + { + skip => 0, + name => 'string descending', + data => [ qw( bdhd BDhd wxj ayewwq rjjx ) ], + gold => sub { $b cmp $a }, + args => [ qw( string descending ) ], + }, + { + skip => 0, + name => 'string no_case descending', + data => [ qw( bdhd BDhd wxj ayewwq rjjx ) ], + gold => sub { uc $b cmp uc $a }, + args => [ qw( string no_case descending ) ], + }, ] ; common_driver( $sort_tests, \@sort_styles ) ; diff -Nurd Sort-Maker-0.06.orig/t/string_data.t Sort-Maker-0.06/t/string_data.t --- Sort-Maker-0.06.orig/t/string_data.t 2006-12-28 09:28:16.000000000 +0200 +++ Sort-Maker-0.06/t/string_data.t 2009-12-26 06:08:02.000000000 +0200 @@ -65,6 +65,31 @@ }, { skip => 0, + source => 0, + name => 'simple string descending novarying', + sizes => [100, 1000], + gen => sub { rand_choice( @string_keys ) }, + gold => sub { $b cmp $a }, + args => { + string => [ qw( string_data string + descending ) ], + index => [ qw( string descending ) ], + } + }, + { + skip => 0, + name => 'simple string no-case descending novarying', + sizes => [100, 1000], + gen => sub { rand_choice( @string_keys ) }, + gold => sub { uc($b) cmp uc($a) }, + args => { + string => [ qw( string_data string no_case + descending ) ], + index => [ qw( string no_case descending ) ], + } + }, + { + skip => 0, name => 'simple number', sizes => [100, 1000], gen => sub { rand_choice( @number_keys ) },