Subject: | Optionally use Unicode::Collate::eq() |
Hi Neil!
Consider this example:
use strict;
use warnings;
use utf8;
use feature qw( say );
use Text::Levenshtein qw( distance );
use Unicode::Collate;
binmode STDOUT, ':encoding(UTF-8)';
my @cities = ( 'Swidnica', 'Ĺwidnica' );
my $collator = Unicode::Collate->new( normalization => undef, level => 1 );
say $collator->eq( @cities ) ? 'exact match' : 'no match';
say 'edit distance ' . distance( @cities );
###
Output is the following:
exact match
edit distance 1
What I'm proposing is the ability to override the eq being used in this module so that words that are equivalent as ASCII return an edit distance of 0.
My use case is that I've got a pile of geographical data and I'm trying to see how accurate it is. Do the city names provided match the city names in the database etc It would be great if something like this could be accounted for when calculating edit distance. No idea if that violates the philosophy behind this code, but figured it was useful to ask.
Thanks,
Olaf