Skip Menu |

This queue is for tickets about the Text-Similarity CPAN distribution.

Report information
The Basics
Id: 107624
Status: resolved
Priority: 0/
Queue: Text-Similarity

People
Owner: Nobody in particular
Requestors: ASB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: 0.11
Fixed in: (no value)



Subject: Inquiry: Is the count of overlaps omitted / set to 1 on purpose?
The method getOverlaps() in Text::OverlapFinder does returns a hashref of overlaps. Each overlap is counted exactly once. The count cannot be greater than 1, because when counting, the method containsReplace is used. The overlap is replaced my MARKERs and cannot be found again. Is this in purpose? Because, in the method getSimilarityStrings() in Text::Similarity::Overlaps, the hashref value is used as operand in a multiplcation: $score += scalar @words * $overlaps->{$key}; I was getting suspicious when I saw that in getOverlaps() in Text::OverlapFinder, the overlaps seems to be counted using the ++ operator: https://metacpan.org/source/TPEDERSE/Text-Similarity-0.11/lib/Text/OverlapFinder.pm#L166 If we only would want to get a 1, we could as well use a simple assignment operator, e.g.: $overlapsHash{$temp} = 1; # remember that we found the overlap $temp
Subject: Re: [rt.cpan.org #107624] Inquiry: Is the count of overlaps omitted / set to 1 on purpose?
Date: Thu, 8 Oct 2015 05:38:58 -0500
To: bug-Text-Similarity [...] rt.cpan.org
From: Ted Pedersen <duluthted [...] gmail.com>
Yes, counting each overlap once is by design. In a case like my dog is a happy dog (A your dog runs fast (B the first occurrence of dog in line A matches with the occurrence of dog in B. The second occurrence in (A does not match the occurrence in B, since matches/overlaps only match one time. However, in a case like this my dog is a happy dog (A your dog and his dog are friends (B then we have two matches for dog - the first occurrence in A matches the first occurrence in B, and then the second occurrence in A matches the second occurrence in B. So, I think that is why we still need to count. We could have multiple overlaps of the same word, even if we only count each overlap once.... On Thu, Oct 8, 2015 at 4:17 AM, Alexander Becker via RT < bug-Text-Similarity@rt.cpan.org> wrote: Show quoted text
> Thu Oct 08 05:17:29 2015: Request 107624 was acted upon. > Transaction: Ticket created by ASB > Queue: Text-Similarity > Subject: Inquiry: Is the count of overlaps omitted / set to 1 on > purpose? > Broken in: 0.11 > Severity: Unimportant > Owner: Nobody > Requestors: ASB@cpan.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=107624 > > > > The method getOverlaps() in Text::OverlapFinder does returns a hashref of > overlaps. > > Each overlap is counted exactly once. The count cannot be greater than 1, > because when counting, the method containsReplace is used. The overlap is > replaced my MARKERs and cannot be found again. > > Is this in purpose? > Because, in the method getSimilarityStrings() in > Text::Similarity::Overlaps, the hashref value is used as operand in a > multiplcation: $score += scalar @words * $overlaps->{$key}; > > I was getting suspicious when I saw that in getOverlaps() in > Text::OverlapFinder, the overlaps seems to be counted using the ++ > operator: > https://metacpan.org/source/TPEDERSE/Text-Similarity-0.11/lib/Text/OverlapFinder.pm#L166 > > If we only would want to get a 1, we could as well use a simple assignment > operator, e.g.: > $overlapsHash{$temp} = 1; # remember that we found the overlap $temp >
I understand, thank you for the example! Am Do 08. Okt 2015, 05:17:29, ASB schrieb: Show quoted text
> The method getOverlaps() in Text::OverlapFinder does returns a hashref > of overlaps. > > Each overlap is counted exactly once. The count cannot be greater than > 1, because when counting, the method containsReplace is used. The > overlap is replaced my MARKERs and cannot be found again. > > Is this in purpose? > Because, in the method getSimilarityStrings() in > Text::Similarity::Overlaps, the hashref value is used as operand in a > multiplcation: $score += scalar @words * $overlaps->{$key}; > > I was getting suspicious when I saw that in getOverlaps() in > Text::OverlapFinder, the overlaps seems to be counted using the ++ > operator: https://metacpan.org/source/TPEDERSE/Text-Similarity- > 0.11/lib/Text/OverlapFinder.pm#L166 > > If we only would want to get a 1, we could as well use a simple > assignment operator, e.g.: > $overlapsHash{$temp} = 1; # remember that we found the overlap $temp