Skip Menu |

This queue is for tickets about the Net-IDN-Encode CPAN distribution.

Report information
The Basics
Id: 103205
Status: rejected
Priority: 0/
Queue: Net-IDN-Encode

People
Owner: CFAERBER [...] cpan.org
Requestors: ab [...] lixutec.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.201
Fixed in: (no value)



Subject: conversion of domain name xn--zcaa.de
the conversion of the IDN domain name "xn--zcaa.de" to unicode is broken: The expected result would be "ßß.de". But using the domain_to_unicode() method a wrongly encoded string is returned. The issue only appears if the domain name only contains "ß" characters. If the domain name also contains other characters the returned unicode string is correct.
Thank you for reporting this bug. I cannot reproduce it on perl 5.16 with the latest version of Net-IDN-Encode. Which version of perl and Net-IDN-Encode are you using. However, I also have an idea what might be the cause. Could you please test the developer release at https://metacpan.org/release/CFAERBER/Net-IDN-Encode-2.201_20150330 and check whether the problem persists?
From: ab [...] lixutec.net
Show quoted text
> I cannot reproduce it on perl 5.16 with the latest version of Net-IDN- > Encode. Which version of perl and Net-IDN-Encode are you using.
I'm using two perl versions on different servers: v5.10.1 and v5.14.2 the version of Net-IDN-Encode is 2.201 I will try the development version soon...
From: ab [...] lixutec.net
Show quoted text
> However, I also have an idea what might be the cause. Could you please > test the developer release at > https://metacpan.org/release/CFAERBER/Net-IDN-Encode-2.201_20150330 > and check whether the problem persists?
The problem still exists with the latest developer release.
From: ab [...] lixutec.net
I wrote a short test script to isolate the issue: #!/usr/bin/perl use Net::IDN::Encode; print STDERR "VERSION: " . $Net::IDN::Encode::VERSION . "\n"; use Net::IDN::Encode ':all'; my $u = domain_to_unicode('xn--zcaa.de'); utf8::encode($u) if utf8::is_utf8($u); print STDERR "UNICODE: " . $u . "\n"; $u = domain_to_unicode('xn--m-qfaaa.de'); utf8::encode($u) if utf8::is_utf8($u); print STDERR "UNICODE: " . $u . "\n"; the output is: VERSION: 2.2012015033 UNICODE: ��.de UNICODE: mßßß.de
From: ab [...] lixutec.net
Show quoted text
> the output is: > VERSION: 2.2012015033 > UNICODE: ��.de > UNICODE: mßßß.de
BTW: using an older version works: VERSION: 2.003 UNICODE: ßß.de UNICODE: mßßß.de
Your problem is this line: utf8::encode($u) if utf8::is_utf8($u); This is wrong. If your display requires UTF-8, you need to encode both strings encoded in UTF-X (is_utf8 is true) as well as strings encoded as bytes (is_utf8 is false). For output, you can also use e.g binmode STDERR, ':utf8'; I'm don't think that Net::IDN::Encode should guarantee the status of the utf8 flag on *_to_unicode. Normally, this flag is handled seamlessly by perl.
Please also note the following from perlunifaq: ------------------------------------------------------------------------------- What is "the UTF8 flag"? Please, unless you're hacking the internals, or debugging weirdness, don't think about the UTF8 flag at all. That means that you very probably shouldn't use is_utf8 , _utf8_on or _utf8_off at all.