Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the List-BinarySearch CPAN distribution.

Report information
The Basics
Id: 99382
Status: rejected
Worked: 20 min
Priority: 0/
Queue: List-BinarySearch

People
Owner: davido [...] cpan.org
Requestors: meir [...] guttman.co.il
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 0.20



Subject: Enhacement suggestion: A binsearch_pos method with an exact/best_position indication
Date: Wed, 08 Oct 2014 17:05:16 +0300
To: bug-List-BinarySearch [...] rt.cpan.org
From: Meir Guttman <meir [...] guttman.co.il>
Dear David, First thank you for a very useful, easy to use and exceptionally documented module. I would like to have a forth method, "binsearch_best_effort" (other name suggestions are more than welcome) that report 1) Whatever 'binsearch_pos' returns, but also ... 2) ... an indication whether it is an exact match or is it the best insertion point. The second item, the indication, could be either a second scalar or a negative index. Running first binsearch and then binsearch_pos if the first returns 'undef' seems ... ugly. Distribution: List::BinarySearch v.0.20 Perl version: All (although I am now on v.5.18.2) OS: All (although I am on Win-7 Pro 64 bit) Regards, Meir
On Wed Oct 08 07:05:31 2014, meir@guttman.co.il wrote: Show quoted text
> I would like to have a forth method, "binsearch_best_effort" (other name > suggestions are more than welcome) that report > 1) Whatever 'binsearch_pos' returns, but also ... > 2) ... an indication whether it is an exact match or is it the best > insertion point. > The second item, the indication, could be either a second scalar or a > negative index. > > Running first binsearch and then binsearch_pos if the first returns 'undef' > seems ... ugly.
Thanks for your interest in, and support of this module. I'm glad you find it useful. I considered an approach similar to what you are suggesting while designing the module, but ultimately decided against it. I'll try to explain my rationale. First, it's unnecessary to do two searches. One simply needs to check that $needle is a match for $haystack[$pos]. For example: my $pos = binsearch_pos { $a cmp $b } $needle, @haystack; if( $needle eq $haystack[$pos] ) { print "Match at $pos.\n"; } else { print "Insert at $pos.\n"; } As you can see here, we're only doing one search. It's trivial to detect whether the element at $pos is a match or not. Now consider the following code: use strict; use warnings; use feature 'say'; use List::BinarySearch 'binsearch_pos'; # Draft implementation: sub binsearch_pos_exact (&$\@) { my( $comparator, $needle, $haystack_aref ) = @_; my $res = binsearch_pos {$comparator->()} $needle, @{$haystack_aref}; local( $a, $b ) = ( $needle, $haystack_aref->[$res] ); return( $res, !( $comparator->() ) ); } my @haystack = ( 'a' .. 'm', 'o' .. 'z' ); # 'n' is missing. my $needle_ok = 'y'; my $needle_nok = 'n'; find_pos( $needle_ok, \@haystack ); find_pos_exact( $needle_ok, \@haystack ); find_pos( $needle_nok, \@haystack ); find_pos_exact( $needle_nok, \@haystack ); sub find_pos { my( $needle, $haystack_aref ) = @_; my $res = binsearch_pos { $a cmp $b } $needle, @{$haystack_aref}; if( $needle eq $haystack_aref->[$res] ) { say "$needle found at $res"; } else { say "$needle not found; insert at $res"; } } sub find_pos_exact { my( $needle, $haystack_aref ) = @_; my( $res, $exact ) = binsearch_pos_exact { $a cmp $b } $needle, @$haystack_aref; if( $exact ) { say "$needle found at $res"; } else { say "$needle not found; insert at $res"; } } As you can see, the new implementation requires a Boolean test inside the binsearch_pos_exact, *and* in the user's code: if( $exact )... So the amount of work being done is actually greater than the user simply detecting a match himself. Additionally, to make the solution generally useful, the final comparison inside of binsearch_pos_exact has to call out to the user-supplied comparator function. So that's one more function call, whereas if the user were doing the comparison, it might be acceptable to simply do a simple 'eq' or '==' comparison. Within the XS version of the module, we could make the final comparator call before tearing down the MULTICALL call-stack, so it wouldn't be terribly inefficient, but there is still a cost, and that cost is still probably greater than the user doing his own match detection. A real-world implementation would inline the binsearch_pos code into binsearch_pos_exact, but it would still result in an extra Boolean comparison, and an extra function call to the user-supplied comparator. So to summarize: * The feature request is based on the incorrect assumption that one must search twice; in reality the user only needs to detect a match using a simple comparison, as demonstrated above. ( if( $needle == $haystack[$pos] ) {... ) * The feature request would not reduce the number of Boolean comparisons being done. The user will still need to do an if($exists){... test, as demonstrated in my second example above. * The feature request actually *adds* a Boolean comparison internally as well as a subroutine call to the user-supplied comparator. So efficiency actually suffers. This internal comparison and subroutine call is demonstrated in my second example, above. * I believe that if( $needle eq $haystack[$pos] ) ... is more self-documenting than if( $exists )... * The feature request adds more complexity to the module without removing significant complexity from the user's code. I'm really happy when people provide suggestions because they do help me to refine my work, and sometimes make it more useful. But in this case I'm going to reject the request for the reasons summarized above. However, I'm still open to discussing it if you can show in code a use case that becomes simplified by the addition of binsearch_pos_exact.
Subject: Re: [rt.cpan.org #99382] Enhacement suggestion: A binsearch_pos method with an exact/best_position indication
Date: Wed, 8 Oct 2014 21:47:05 +0300
To: bug-List-BinarySearch [...] rt.cpan.org
From: Meir Guttman <mguttman4 [...] gmail.com>
Dear David, Wow, what a detailed and convincing dissertation! Thank you! Well, I shouldn't be surprised, having seen your module's documentation. OK, yes, you convinced me. No reason to do this suggestion of mine. So, happy searching, binary or otherwise Meir On Oct 8, 2014 7:24 PM, "David J. Oswald via RT" < bug-List-BinarySearch@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=99382 > > > On Wed Oct 08 07:05:31 2014, meir@guttman.co.il wrote: >
> > I would like to have a forth method, "binsearch_best_effort" (other name > > suggestions are more than welcome) that report > > 1) Whatever 'binsearch_pos' returns, but also ... > > 2) ... an indication whether it is an exact match or is it the best > > insertion point. > > The second item, the indication, could be either a second scalar or a > > negative index. > > > > Running first binsearch and then binsearch_pos if the first returns
> 'undef'
> > seems ... ugly.
> > Thanks for your interest in, and support of this module. I'm glad you > find it useful. > > I considered an approach similar to what you are suggesting while > designing the module, but ultimately decided against it. I'll try to > explain my rationale. > > First, it's unnecessary to do two searches. One simply needs to check > that $needle is a match for $haystack[$pos]. > > For example: > > my $pos = binsearch_pos { $a cmp $b } $needle, @haystack; > if( $needle eq $haystack[$pos] ) { > print "Match at $pos.\n"; > } > else { > print "Insert at $pos.\n"; > } > > As you can see here, we're only doing one search. It's trivial to detect > whether the element at $pos is a match or not. > > Now consider the following code: > > use strict; > use warnings; > use feature 'say'; > > use List::BinarySearch 'binsearch_pos'; > > # Draft implementation: > sub binsearch_pos_exact (&$\@) { > my( $comparator, $needle, $haystack_aref ) = @_; > my $res = binsearch_pos {$comparator->()} $needle, @{$haystack_aref}; > local( $a, $b ) = ( $needle, $haystack_aref->[$res] ); > return( $res, !( $comparator->() ) ); > } > > > > > my @haystack = ( 'a' .. 'm', 'o' .. 'z' ); # 'n' is missing. > my $needle_ok = 'y'; > my $needle_nok = 'n'; > > find_pos( $needle_ok, \@haystack ); > find_pos_exact( $needle_ok, \@haystack ); > find_pos( $needle_nok, \@haystack ); > find_pos_exact( $needle_nok, \@haystack ); > > > sub find_pos { > my( $needle, $haystack_aref ) = @_; > > my $res = binsearch_pos { $a cmp $b } $needle, @{$haystack_aref}; > > if( $needle eq $haystack_aref->[$res] ) { > say "$needle found at $res"; > } > else { > say "$needle not found; insert at $res"; > } > } > > > sub find_pos_exact { > my( $needle, $haystack_aref ) = @_; > my( $res, $exact ) > = binsearch_pos_exact { $a cmp $b } $needle, @$haystack_aref; > > if( $exact ) { > say "$needle found at $res"; > } > else { > say "$needle not found; insert at $res"; > } > } > > As you can see, the new implementation requires a Boolean test inside the > binsearch_pos_exact, *and* in the user's code: if( $exact )... > > So the amount of work being done is actually greater than the user simply > detecting a match himself. > > Additionally, to make the solution generally useful, the final comparison > inside of binsearch_pos_exact has to call out to the user-supplied > comparator function. So that's one more function call, whereas if the > user were doing the comparison, it might be acceptable to simply do a > simple 'eq' or '==' comparison. > > Within the XS version of the module, we could make the final comparator > call before tearing down the MULTICALL call-stack, so it wouldn't be > terribly inefficient, but there is still a cost, and that cost is still > probably greater than the user doing his own match detection. > > A real-world implementation would inline the binsearch_pos code into > binsearch_pos_exact, but it would still result in an extra Boolean > comparison, and an extra function call to the user-supplied comparator. > > So to summarize: > > * The feature request is based on the incorrect assumption that one must > search twice; in reality the user only needs to detect a match using a > simple comparison, as demonstrated above. ( if( $needle == $haystack[$pos] > ) {... ) > > * The feature request would not reduce the number of Boolean comparisons > being done. The user will still need to do an if($exists){... test, as > demonstrated in my second example above. > > * The feature request actually *adds* a Boolean comparison internally as > well as a subroutine call to the user-supplied comparator. So efficiency > actually suffers. This internal comparison and subroutine call is > demonstrated in my second example, above. > > * I believe that if( $needle eq $haystack[$pos] ) ... is more > self-documenting than if( $exists )... > > * The feature request adds more complexity to the module without removing > significant complexity from the user's code. > > I'm really happy when people provide suggestions because they do help me > to refine my work, and sometimes make it more useful. But in this case I'm > going to reject the request for the reasons summarized above. However, I'm > still open to discussing it if you can show in code a use case that becomes > simplified by the addition of binsearch_pos_exact. >