Skip Menu |

This queue is for tickets about the Unicode-Normalize CPAN distribution.

Report information
The Basics
Id: 53197
Status: resolved
Priority: 0/
Queue: Unicode-Normalize

People
Owner: Nobody in particular
Requestors: CFAERBER [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.03
Fixed in: (no value)



Subject: NFKC("\x{2000}") produces "\x20\x05" on some perls >= 5.11.2
Hi. Sometimes, NFKC("\x{2000}") produces an extra "\x05" in the output.

This problem does seem to be isolated to some platforms. It has been observed with U::N 1.03
running some amd64 operating system; I'm not sure whether it also occurs with U::N 1.05.

Please find a test case attached.

Subject: perl-5.11.2.t
use strict; use utf8; no warnings 'utf8'; use Test::More tests => 1; use Unicode::Normalize(); is( Unicode::Normalize::NFKC("\x{2000}"), " ", 'NFKC of U+2000' );
As I've discovered the problem with test vectors for Net::IDN::Encode/Unicode::Stringprep, some CPAN tests are available here: 

http://matrix.cpantesters.org/?dist=Unicode-Stringprep%201.09_70091230
http://matrix.cpantesters.org/?dist=Unicode-Stringprep%201.02 (these two are the most interesting versions, please ignore the experimental versions 1.09_2009????)

The problem occurs in these tests as (N.B. the ^E is not visible):
#   Failed test 'Non-ASCII multibyte space character U+2000'
#   at t/nameprep_st.t line 258.
#          got: ' '
#     expected: ' '

#   Failed test 'Larger test (shrinking)'
#   at t/nameprep_st.t line 258.
#          got: 'xssi̇telǰ aΰ '
#     expected: 'xssi̇telǰ aΰ '

The prime suspect is now the generated file lib/unicode/Decomposition.pl in bleadperl:

2000		2002
2001		2003
2002	2006	 0020 # [5]
2007		 0020
2008	200A	 0020 # [3]
Probably there's no fix required for Unicode::Normalize. I'll write a patch for perl, then.
It's fixed in bleadperl/5.11.4