On Wed Oct 08 07:05:31 2014, meir@guttman.co.il wrote:
Show quoted text> I would like to have a forth method, "binsearch_best_effort" (other name
> suggestions are more than welcome) that report
> 1) Whatever 'binsearch_pos' returns, but also ...
> 2) ... an indication whether it is an exact match or is it the best
> insertion point.
> The second item, the indication, could be either a second scalar or a
> negative index.
>
> Running first binsearch and then binsearch_pos if the first returns 'undef'
> seems ... ugly.
Thanks for your interest in, and support of this module. I'm glad you find it useful.
I considered an approach similar to what you are suggesting while designing the module, but ultimately decided against it. I'll try to explain my rationale.
First, it's unnecessary to do two searches. One simply needs to check that $needle is a match for $haystack[$pos].
For example:
my $pos = binsearch_pos { $a cmp $b } $needle, @haystack;
if( $needle eq $haystack[$pos] ) {
print "Match at $pos.\n";
}
else {
print "Insert at $pos.\n";
}
As you can see here, we're only doing one search. It's trivial to detect whether the element at $pos is a match or not.
Now consider the following code:
use strict;
use warnings;
use feature 'say';
use List::BinarySearch 'binsearch_pos';
# Draft implementation:
sub binsearch_pos_exact (&$\@) {
my( $comparator, $needle, $haystack_aref ) = @_;
my $res = binsearch_pos {$comparator->()} $needle, @{$haystack_aref};
local( $a, $b ) = ( $needle, $haystack_aref->[$res] );
return( $res, !( $comparator->() ) );
}
my @haystack = ( 'a' .. 'm', 'o' .. 'z' ); # 'n' is missing.
my $needle_ok = 'y';
my $needle_nok = 'n';
find_pos( $needle_ok, \@haystack );
find_pos_exact( $needle_ok, \@haystack );
find_pos( $needle_nok, \@haystack );
find_pos_exact( $needle_nok, \@haystack );
sub find_pos {
my( $needle, $haystack_aref ) = @_;
my $res = binsearch_pos { $a cmp $b } $needle, @{$haystack_aref};
if( $needle eq $haystack_aref->[$res] ) {
say "$needle found at $res";
}
else {
say "$needle not found; insert at $res";
}
}
sub find_pos_exact {
my( $needle, $haystack_aref ) = @_;
my( $res, $exact )
= binsearch_pos_exact { $a cmp $b } $needle, @$haystack_aref;
if( $exact ) {
say "$needle found at $res";
}
else {
say "$needle not found; insert at $res";
}
}
As you can see, the new implementation requires a Boolean test inside the binsearch_pos_exact, *and* in the user's code: if( $exact )...
So the amount of work being done is actually greater than the user simply detecting a match himself.
Additionally, to make the solution generally useful, the final comparison inside of binsearch_pos_exact has to call out to the user-supplied comparator function. So that's one more function call, whereas if the
user were doing the comparison, it might be acceptable to simply do a simple 'eq' or '==' comparison.
Within the XS version of the module, we could make the final comparator call before tearing down the MULTICALL call-stack, so it wouldn't be terribly inefficient, but there is still a cost, and that cost is still probably greater than the user doing his own match detection.
A real-world implementation would inline the binsearch_pos code into binsearch_pos_exact, but it would still result in an extra Boolean comparison, and an extra function call to the user-supplied comparator.
So to summarize:
* The feature request is based on the incorrect assumption that one must search twice; in reality the user only needs to detect a match using a simple comparison, as demonstrated above. ( if( $needle == $haystack[$pos] ) {... )
* The feature request would not reduce the number of Boolean comparisons being done. The user will still need to do an if($exists){... test, as demonstrated in my second example above.
* The feature request actually *adds* a Boolean comparison internally as well as a subroutine call to the user-supplied comparator. So efficiency actually suffers. This internal comparison and subroutine call is demonstrated in my second example, above.
* I believe that if( $needle eq $haystack[$pos] ) ... is more self-documenting than if( $exists )...
* The feature request adds more complexity to the module without removing significant complexity from the user's code.
I'm really happy when people provide suggestions because they do help me to refine my work, and sometimes make it more useful. But in this case I'm going to reject the request for the reasons summarized above. However, I'm still open to discussing it if you can show in code a use case that becomes simplified by the addition of binsearch_pos_exact.