Skip Menu |

This queue is for tickets about the Text-Levenshtein CPAN distribution.

Report information
The Basics
Id: 76008
Status: new
Priority: 0/
Queue: Text-Levenshtein

People
Owner: Nobody in particular
Requestors: swapnil.tailor [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.05
Fixed in: (no value)



As mentioned on wiki: http://en.wikipedia.org/wiki/Levenshtein_distance#Possible_improvements "If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix. In this way, the algorithm can be run in O(kl) time, where l is the length of the shortest string." I have implemented the functionality and would like to be added in the module. Attaching the file with source code and how to use it. Let me know if you can include that as part of your module. Thanks, Swapnil
Subject: LevenshteinDistance.pm
#Call should be fastdistancewiththreshold("word1" , "word2", $threshold) sub fastdistancewiththreshold { my ($word1, $word2, $threshold) = @_; return 0 if $word1 eq $word2; my $i; my $j; my $len1 = length($word1); my $len2 = length($word2); if($len1 > $len2){ $tmp = $word1; $word1 = $word2; $word2 = $tmp; $tmp = $len1; $len1 = $len2; $len2 = $tmp; } my $limit = 2*$threshold+1; $d[0][0] = 0; for (1 ... $limit) { $d[$_][0] = $_; if($_ != $m && substr($word1, $_) eq substr($word2, $_)){ return $_; } } for (1 ... $limit) { $d[0][$_] = $_; if($_ != $n && substr($word1, $_) eq substr($word2, $_)){ return $_; } } my $l = 1; for my $i (1 ... $m) { my $w = substr($word1, $i-1, 1); for ($j = $l-$threshold; $j <= $l + $threshold && $j <= $n; $j++){ if($w eq substr($word2, $j-1, 1)){ $d[$i][$j] = $d[$i-1][$j-1]; } else { if($i == $j) { $d[$i][$j] = minimum($d[$i-1][$j], $d[$i][$j-1], $d[$i-1][$j-1]) + 1; } else { if($i < $j) { $d[$i][$j] = min($d[$i][$j-1] , $d[$i-1][$j-1]) + 1; } else { $d[$i][$j] = min($d[$i-1][$j] , $d[$i-1][$j-1]) + 1; } } } } $l++; } return $d[$m][$n]; }