Skip Menu |

This queue is for tickets about the File-SortedSeek CPAN distribution.

Report information
The Basics
Id: 36160
Status: resolved
Priority: 0/
Queue: File-SortedSeek

People
Owner: Nobody in particular
Requestors: cpan [...] pjedwards.co.uk
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.012
Fixed in: (no value)



Subject: Bug when many lines have the same data value, 1st is not found.
Hello and thanks for File::SortedSeek. Given a file with 100 lines, 10 of each of the numbers 0 to 9. Calling numeric or alphabetic leaves the file position pointer *not* at the beginning of the first matching line. If you've log files with time granuality in seconds, and more than 10 lines per second you might run into this. There is a test below that demonstrates this. I'd suggest using Search::Dict to do the binary search as it does not suffer from this problem, this is also suggested in: http://rt.cpan.org/Ticket/Display.html?id=23874 Note that bug: http://rt.cpan.org/Ticket/Display.html?id=30778 suggests allowing a stationary override, this might help with the behaviour I'm reporting if a suitable stationary override is provided, but it doesn't seem like a comprehensive solution. Cheers, Peter (Stig) Edwards use strict; use warnings; use Test::More tests=>2; use File::SortedSeek; # silence warnings to std err to avoid duplicates File::SortedSeek::set_silent; $|++; my $file = './test.file'; open TESTOUT, '>',$file or die "Can't write test file $!\n"; # write 10 of each of the numbers from 0 to 9 foreach my $item (0..9) { for ( 0..9 ) { print TESTOUT "$item\n"; } } close TESTOUT; open TESTIN, '<',$file or die "Can't read from test file $!\n"; my $tell = File::SortedSeek::numeric( *TESTIN, '7' ); my $num_of_sevens=0; while ( my $line = <TESTIN> ){ if($line =~ m/7/mxo){ $num_of_sevens++; } else { last; } } is($num_of_sevens,10,'10 sevens'); close TESTIN; #1 while ( unlink $file ); ok (! -e $file,'Test file unlinked ok'); 1;
Subject: Re: [rt.cpan.org #36160] Bug when many lines have the same data value, 1st is not found.
Date: Sun, 25 May 2008 13:07:59 +1000
To: bug-File-SortedSeek [...] rt.cpan.org
From: "James Freeman" <airmedical [...] gmail.com>
Hi Peter, Thanks for the report. I have not been involved with perl for some years but recently have been doing a bit. I had a look at the code and concluded it is desperately in need of refactoring!!!! Using [Search::Dict] makes good sense. At the time this was originally written Search::Dict did not exist (and was certainly not core as it is now). All it was was a little script to grab chunks of logfiles. It suffers badly from being a quick specific hack, later badly generalised. Kind Regards James On 25/05/2008, Peter John Edwards via RT <bug-File-SortedSeek@rt.cpan.org> wrote: Show quoted text
> > > Sat May 24 17:43:47 2008: Request 36160 was acted upon. > Transaction: Ticket created by cpan@pjedwards.co.uk > Queue: File-SortedSeek > Subject: Bug when many lines have the same data value, 1st is not > found. > Broken in: 0.012 > Severity: Important > Owner: Nobody > Requestors: cpan@pjedwards.co.uk > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=36160 > > > > Hello and thanks for File::SortedSeek. > > Given a file with 100 lines, 10 of each of the numbers 0 to 9. > Calling numeric or alphabetic leaves the file position pointer > *not* at the beginning of the first matching line. > > If you've log files with time granuality in seconds, and more > than 10 lines per second you might run into this. > > There is a test below that demonstrates this. > > I'd suggest using Search::Dict to do the binary search as it does > not suffer from this problem, this is also suggested in: > http://rt.cpan.org/Ticket/Display.html?id=23874 > Note that bug: > http://rt.cpan.org/Ticket/Display.html?id=30778 > suggests allowing a stationary override, this might help with > the behaviour I'm reporting if a suitable stationary override > is provided, but it doesn't seem like a comprehensive solution. > > Cheers, > Peter (Stig) Edwards > > use strict; > use warnings; > use Test::More tests=>2; > use File::SortedSeek; > # silence warnings to std err to avoid duplicates > File::SortedSeek::set_silent; > $|++; > my $file = './test.file'; > open TESTOUT, '>',$file or die "Can't write test file $!\n"; > # write 10 of each of the numbers from 0 to 9 > foreach my $item (0..9) { > for ( 0..9 ) { > print TESTOUT "$item\n"; > } > } > close TESTOUT; > > open TESTIN, '<',$file or die "Can't read from test file $!\n"; > my $tell = File::SortedSeek::numeric( *TESTIN, '7' ); > my $num_of_sevens=0; > while ( my $line = <TESTIN> ){ > if($line =~ m/7/mxo){ > $num_of_sevens++; > } else { > last; > } > } > is($num_of_sevens,10,'10 sevens'); > close TESTIN; > #1 while ( unlink $file ); > ok (! -e $file,'Test file unlinked ok'); > 1; > >
From: cpan [...] pjedwards.co.uk
I don't mean to be presumptuous, it's been a rainy holiday weekend. Here's a version of SortedSeek.pm that uses Search::Dict and passes all the tests. I've not tested edge/boundary cases with cuddle. Peter (Stig) Edwards

Message body is not shown because it is too large.

Subject: Re: [rt.cpan.org #36160] Bug when many lines have the same data value, 1st is not found.
Date: Mon, 26 May 2008 22:06:48 +1000
To: bug-File-SortedSeek [...] rt.cpan.org
From: "James Freeman" <airmedical [...] gmail.com>
Hi Peter, Thanks for that. No objections. It certainly cleans things up a bit. It seems to me that if we are going to cut and paste from Search::Dict to get the fuzzy cuddle search to work it would probably make sense to patch Search::Dict to support this as an option. It is (IMHO) an option that has some utility. In the interim using this modified version internally might make sense as it removes the dependency. It's a rainy night here, so provided work does not get too busy a new version should make its way to CPAN by tomorrow. I have a look at the test suite and see what needs tightening up to make sure it all works as expected. Kind Regards James On 26/05/2008, Peter John Edwards via RT <bug-File-SortedSeek@rt.cpan.org> wrote: Show quoted text
> > > Queue: File-SortedSeek > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=36160 > > > I don't mean to be presumptuous, it's been a rainy holiday weekend. > Here's a version of SortedSeek.pm that uses Search::Dict and passes > all the tests. I've not tested edge/boundary cases with cuddle. > > Peter (Stig) Edwards > >
Subject: Re: [rt.cpan.org #36160] Bug when many lines have the same data value, 1st is not found.
Date: Tue, 27 May 2008 09:11:07 +1000
To: bug-File-SortedSeek [...] rt.cpan.org
From: "James Freeman" <airmedical [...] gmail.com>
Hi Peter, There is a new version up which has had some serious refactoring. I might get a chance to do some more work on it tonight, then again it could be another 7 years! Kind Regards James On 26/05/2008, Peter John Edwards via RT <bug-File-SortedSeek@rt.cpan.org> wrote: Show quoted text
> > > Queue: File-SortedSeek > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=36160 > > > I don't mean to be presumptuous, it's been a rainy holiday weekend. > Here's a version of SortedSeek.pm that uses Search::Dict and passes > all the tests. I've not tested edge/boundary cases with cuddle. > > Peter (Stig) Edwards > >
Hello James, Thank you. File::SortedSeek 0.013 is building and passing it's tests for me on Linux. Some thoughts: .) Could you rename the tests in t so that they do not have 2 periods/dots in the filename? Certain OS/filesystems (including the VMS one I'd like to install this on) only support one period/dot in the filename: 01-load.t 02-numeric_ascending.t ...would be great .) I was thinking adding 'cuddle' as a param to _look would be good: sub _look{ local *FILE = shift; my($key,$params) = @_; ... my $cuddle = $params->{cuddle}; .... But then I noticed _look returns -1 (not undef), doesn't have a default comp sub and also sets $exact_match so I now don't think adding 'cuddle' as a param would be good. (I was thinking about how to add cuddle to Search::Dict::look) .) Makefile.PL lists Search::Dict as a PREREQ_PM, I think this can be removed. .) As you've not been involved in perl for a while I thought I'd point you to the CPAN testing service: http://cpants.perl.org/dist/overview/File-SortedSeek http://cpants.perl.org/dist/kwalitee/File-SortedSeek make metafile Will produce META.yml Hope this is useful. Peter (Stig) Edwards
Subject: Re: [rt.cpan.org #36160] Bug when many lines have the same data value, 1st is not found.
Date: Wed, 28 May 2008 01:04:06 +1000
To: bug-File-SortedSeek [...] rt.cpan.org
From: "James Freeman" <airmedical [...] gmail.com>
Hi Peter, In theory it would be quite possible to include the cuddle option in Search::Dict as it only requires about extra 6 lines to deal with it. However this would then require getting that implemented before you could use it and insisting on the new version as a prerequisite. Up until that happens you still need to implement it internally. To save < 30 lines of code it hardly seems worth the effort or the dependency. I left the param hash in place with that in mind..... I'll change the test names. I was not aware of that issue. Oops forgot I added it to the Makefile.PL as a pre-req. Ah so that's what those META.yml files are all about. I have been ignoring them. Interesting that h2xs does not generate one as it has obviously been updated since last I used it. Thanks very much for your feedback. Kind Regards Dr James Freeman (tachyon) On 27/05/2008, Peter John Edwards via RT <bug-File-SortedSeek@rt.cpan.org> wrote: Show quoted text
> > > Queue: File-SortedSeek > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=36160 > > > Hello James, > Thank you. > File::SortedSeek 0.013 is building and passing it's tests for me on Linux. > Some thoughts: > > .) Could you rename the tests in t so that they do not have 2 > periods/dots in the filename? Certain OS/filesystems > (including the VMS one I'd like to install this on) > only support one period/dot in the filename: > 01-load.t > 02-numeric_ascending.t > ...would be great > > .) I was thinking adding 'cuddle' as a param to _look would be good: > > sub _look{ > local *FILE = shift; > my($key,$params) = @_; > ... > my $cuddle = $params->{cuddle}; > .... > > But then I noticed _look returns -1 (not undef), doesn't have > a default comp sub and also sets $exact_match so I now don't > think adding 'cuddle' as a param would be good. > (I was thinking about how to add cuddle to Search::Dict::look) > > .) Makefile.PL lists Search::Dict as a PREREQ_PM, I think this > can be removed. > > .) As you've not been involved in perl for a while I thought I'd > point you to the CPAN testing service: > http://cpants.perl.org/dist/overview/File-SortedSeek > http://cpants.perl.org/dist/kwalitee/File-SortedSeek > > make metafile > > Will produce META.yml > > Hope this is useful. > Peter (Stig) Edwards >
Fixed in 0.13 onwards - as per your patch ;-)