Yes, counting each overlap once is by design. In a case like
my dog is a happy dog (A
your dog runs fast (B
the first occurrence of dog in line A matches with the occurrence of dog in
B. The second occurrence in (A does not match the occurrence in B, since
matches/overlaps only match one time.
However, in a case like this
my dog is a happy dog (A
your dog and his dog are friends (B
then we have two matches for dog - the first occurrence in A matches the
first occurrence in B, and then the second occurrence in A matches the
second occurrence in B.
So, I think that is why we still need to count. We could have multiple
overlaps of the same word, even if we only count each overlap once....
On Thu, Oct 8, 2015 at 4:17 AM, Alexander Becker via RT <
bug-Text-Similarity@rt.cpan.org> wrote:
Show quoted text> Thu Oct 08 05:17:29 2015: Request 107624 was acted upon.
> Transaction: Ticket created by ASB
> Queue: Text-Similarity
> Subject: Inquiry: Is the count of overlaps omitted / set to 1 on
> purpose?
> Broken in: 0.11
> Severity: Unimportant
> Owner: Nobody
> Requestors: ASB@cpan.org
> Status: new
> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=107624 >
>
>
> The method getOverlaps() in Text::OverlapFinder does returns a hashref of
> overlaps.
>
> Each overlap is counted exactly once. The count cannot be greater than 1,
> because when counting, the method containsReplace is used. The overlap is
> replaced my MARKERs and cannot be found again.
>
> Is this in purpose?
> Because, in the method getSimilarityStrings() in
> Text::Similarity::Overlaps, the hashref value is used as operand in a
> multiplcation: $score += scalar @words * $overlaps->{$key};
>
> I was getting suspicious when I saw that in getOverlaps() in
> Text::OverlapFinder, the overlaps seems to be counted using the ++
> operator:
>
https://metacpan.org/source/TPEDERSE/Text-Similarity-0.11/lib/Text/OverlapFinder.pm#L166
>
> If we only would want to get a 1, we could as well use a simple assignment
> operator, e.g.:
> $overlapsHash{$temp} = 1; # remember that we found the overlap $temp
>