Skip Menu |

This queue is for tickets about the Net-IDN-Encode CPAN distribution.

Report information
The Basics
Id: 103368
Status: open
Priority: 0/
Queue: Net-IDN-Encode

People
Owner: CFAERBER [...] cpan.org
Requestors: matthew.unwin [...] returnpath.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Issue converting a unicode domain to ascii and that same domain from ascii to unicode
Date: Tue, 7 Apr 2015 13:35:38 -0600
To: bug-Net-IDN-Encode [...] rt.cpan.org
From: Matthew Unwin <matthew.unwin [...] returnpath.com>
We have a client who has registered the following two domains (these are in the Tamil language): ெசாசியதெ-ெஜனரால.com (xn----oweaj2b6a1bms6ihf1ggb.com) ெசாசியதெ-ெஜனரால.net (xn----oweaj2b6a1bms6ihf1ggb.net) These domains fail to convert when using Net::IDN::Encode version 2.201 and perl 5.18 on Centos 6.5. When I try to convert the two domains above using domain_to_ascii(), I get the following error: begins with General_Category=Mark [V5] at .../lib/perl5/x86_64-linux/Net/IDN/Encode.pm line 46. The reverse, domain_to_unicode() also fails when testing with the converted values noted above. I have tried all combinations of the optional parameters: AllowUnassigned, UseSTD3ASCIIRules, TransitionalProcessing without success. I have also tried: uts46_to_ascii() / uts46_to_unicode -- fails idna2003_to_ascii() -- succeeds, results in: xn----oweaj2b6a1bms6ihf1ggb encode_punycode() [tested without the .com and .net] -- succeeds, results in: --oweaj2b6a1bms6ihf1ggb I have tried a variety of on-line tools to try and validate that the domain names are valid: http://mct.verisign-grs.com/ -- fails http://㯙㯜㯙㯟.net/ <http://xn--domain.net/> --succeeds (works in both idna2003 and idna2008 modes and prints out code points) http://punycode.phlymail.de/ --succeeds (works in both idna2003 and idna2008 modes) http://www.motobit.com/util/punycode-decoder-encoder.asp -- succeeds (used "To IDN") https://iwantmyname.com/domain-tools/idns/idn-punycode-converter --succeeds http://www.punycoder.com/ -- succeeds https://mothereff.in/punycode --succeeds http://idn-encoding.online-domain-tools.com/ --succeeds http://www.idnconverter.se/ --succeeds So, other than Verisign's online tool, I haven't found another unicode to IDN/punycode converter that has problems converting the two domains above. This leads me to believe there is a bug somewhere in Net::IDN::Encode. Thanks!
I think that Net::IDN::Encode is correct here, as the label starts with U+0BC6 (TAMIL VOWEL SIGN E), which is a combining mark (Mark, Spacing Combining [Mc]). In IDNA 2008, labels must not start with a combining mark. IDNA 2008 and UTS #46 are in agreement about this: RFC 5891, section 4.2.3.2: The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition). UTS #46, section 4.1: 5. The label must not begin with a combining mark, that is: General_Category=Mark. It also does not make sense to START a label with a character that - being a combining mark - has to FOLLOW another character.
I think that Net::IDN::Encode is correct here, as the label starts with U+0BC6 (TAMIL VOWEL SIGN E), which is a combining mark (Mark, Spacing Combining [Mc]). In IDNA 2008, labels must not start with a combining mark. IDNA 2008 and UTS #46 are in agreement about this: RFC 5891, section 4.2.3.2: The Unicode string MUST NOT begin with a combining mark or combining character (see The Unicode Standard, Section 2.11 [Unicode] for an exact definition). UTS #46, section 4.1: 5. The label must not begin with a combining mark, that is: General_Category=Mark. It also does not make sense to START a label with a character that - being a combining mark - has to FOLLOW another character.
It is strange, however, hat IDNA 2003 did allow the registration of the string as a domain name. The intention behind Net::IDN::Encode is that strings allowed in any IDNA version (IDNA 2003, UTS #46, IDNA 2008) are also allowed by Net::IDN::Encode. So far, my impression was that UTS #46 would serve that purpose. So I'm considering to add an option (or opt-out) to ignore rule V5 for strings that were valid IDNA 2003 strings.