CC: | Shin Kojima <shin [...] kojima.org> |
Subject: | [PATCH 1/2] Fix domain_to_ascii AllowUnassigned param |
Date: | Fri, 7 Sep 2018 11:58:02 +0900 |
To: | bug-Net-IDN-Encode [...] rt.cpan.org |
From: | Shin Kojima <shin [...] kojima.org> |
While using old perl (e.g. v5.8.8), `Net::IDN::Encode::domain_to_ascii`
can't handle Emoji domains even if `AllowUnassigned` param is on.
print domain_to_ascii("\x{1f985}.example", AllowUnassigned => 1);
Above code gave me an error as following:
disallowed character U+1F985
But when I add some ascii characters along with the emoji, it convert as
I expected:
print domain_to_ascii("AAAAA\x{1f985}.example", AllowUnassigned => 1);
Result:
xn--aaaaa-ki74d.example
The original code has an extra closing bracket, and it seems something
is going very wrong here for me.
m/^(\P{IsDisallowed}}|\P{Assigned})*$/
^
here
https://en.wikipedia.org/wiki/Emoji_domain
Signed-off-by: Shin Kojima <shin@kojima.org>
---
lib/Net/IDN/UTS46.pm | 7 +++----
t/domain_to_ascii.t | 4 +++-
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/lib/Net/IDN/UTS46.pm b/lib/Net/IDN/UTS46.pm
index 23a363e..11f5702 100644
--- a/lib/Net/IDN/UTS46.pm
+++ b/lib/Net/IDN/UTS46.pm
@@ -62,10 +62,9 @@ sub _process {
# 1. Map
# - disallowed
#
- if($param{'AllowUnassigned'}) {
- $label =~ m/^(\P{IsDisallowed}}|\P{Assigned})*$/ and croak sprintf('disallowed character U+%04X', ord($1));
- } else {
- $label =~ m/(\p{IsDisallowed})/ and croak sprintf('disallowed character U+%04X', ord($1));
+ $label =~ m/(\p{IsDisallowed})/ and croak sprintf('disallowed character U+%04X', ord($1));
+
+ unless($param{'AllowUnassigned'}) {
$label =~ m/(\P{Assigned})/ and croak sprintf('unassigned character U+%04X (in this version of perl)', ord($1));
}
diff --git a/t/domain_to_ascii.t b/t/domain_to_ascii.t
index 90a025f..747bddd 100755
--- a/t/domain_to_ascii.t
+++ b/t/domain_to_ascii.t
@@ -5,7 +5,7 @@ BEGIN { binmode STDOUT, ':utf8'; binmode STDERR, ':utf8'; }
use Net::IDN::Encode qw(:all);
-use Test::More tests => 1 + 12;
+use Test::More tests => 1 + 13;
use Test::NoWarnings;
use Net::IDN::Encode qw(:all);
@@ -16,6 +16,8 @@ is(eval{domain_to_ascii('XN--MLLER-KVA')} || $@, 'XN--MLLER-KVA', 'single upperc
is(eval{domain_to_ascii('www.jürg.xn--mller-kva.com', )} || $@, 'www.xn--jrg-hoa.xn--mller-kva.com', 'mixed utf8/ace/ascii');
is(eval{domain_to_ascii('www.a.b。c.d。com', )} || $@, 'www.a.b.c.d.com', 'mixed dots');
+is(eval{domain_to_ascii("www.\x{1F985}.example", AllowUnassigned => 1)} || $@, 'www.xn--4s9h.example', 'Unicode 9.0 emoji');
+
is(eval{domain_to_ascii('www.ä ö ü ß.example', 'UseSTD3ASCIIRules' => 0)}, 'www.xn-- -7kav3ivb.example', 'blank (without STD3 rules) (to_unicode)') or diag $@;
is(eval{domain_to_ascii('www.ä ö ü ß.example', 'UseSTD3ASCIIRules' => 1)}, undef, 'blank (with STD3 rules) (to_unicode)') or diag $@;
is(eval{domain_to_ascii('www.xn-- -7kav3ivb.example', 'UseSTD3ASCIIRules' => 0)}, 'www.xn-- -7kav3ivb.example', 'blank (without STD3 rules) (to_unicode pass-through)') or diag $@;
--
2.18.0