Subject: | Implement full casefold: normalize |
You were taken just the implementation of perl fc operator, which is broken regarding normalization, composed and decomposed strings.
A full fold-case operation obviously has to normalize the string otherwise
strings which are equal under the definition of "
Casefolding is the process of mapping strings to a form where case
differences are erased; comparing two strings in their casefolded
form is effectively a way of asking if two strings are equal,
regardless of case."
will not match.
cperl will implement full fc ([cperl #332], safeclib also with wcsfc_s(),
most other's are missing this, and hence will not find equal unicode strings.
Note that Unicode case-fold is something different than foldcase. foldcase guarantees equality for matching strings, case-fold only regarding folding. full case-fold will expand strings (2-4), NFD even more (with NFKD even up to max 18 for certain arabic letters), NFC will contract them again.
I chose NFD for performance reasons, but for your simple module NFC would be the best. This adds two more steps: ordering and composition. Unicode::Normalize has all these.
--
Reini Urban