Skip Menu |

This queue is for tickets about the Unicode-CaseFold CPAN distribution.

Report information
The Basics
Id: 123059
Status: new
Priority: 0/
Queue: Unicode-CaseFold

People
Owner: Nobody in particular
Requestors: RURBAN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Implement full casefold: normalize
You were taken just the implementation of perl fc operator, which is broken regarding normalization, composed and decomposed strings. A full fold-case operation obviously has to normalize the string otherwise strings which are equal under the definition of " Casefolding is the process of mapping strings to a form where case differences are erased; comparing two strings in their casefolded form is effectively a way of asking if two strings are equal, regardless of case." will not match. cperl will implement full fc ([cperl #332], safeclib also with wcsfc_s(), most other's are missing this, and hence will not find equal unicode strings. Note that Unicode case-fold is something different than foldcase. foldcase guarantees equality for matching strings, case-fold only regarding folding. full case-fold will expand strings (2-4), NFD even more (with NFKD even up to max 18 for certain arabic letters), NFC will contract them again. I chose NFD for performance reasons, but for your simple module NFC would be the best. This adds two more steps: ordering and composition. Unicode::Normalize has all these. -- Reini Urban