Skip Menu |

This queue is for tickets about the Unicode-Normalize CPAN distribution.

Report information
The Basics
Id: 63393
Status: resolved
Priority: 0/
Queue: Unicode-Normalize

People
Owner: Nobody in particular
Requestors: public [...] khwilliamson.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: 1.10



Subject: Support new API in utf8n_to_uvuni()
The Perl 5.14 core is changing to handle the Unicode non-characters properly. These are legal everywhere except for interchange between applications, but Perl has treated them as illegal. Because of all this, and the fact that Perl has it wrong as to which code points are non-characters, the API is changing. As a result, Normalize needs to change to use the new API. The attached patch does that, while preserving compatibility with older Perls. Note that the behavior of Normalize doesn't change, as it explicitly allowed these characters. FYI, also, utf8n_to_uvuni() has always allowed code points that are higher than the Unicode maximum of U+10FFFF. The new API will add a flag to disallow them if desired. So as to not break anything, I am waiting until this patch is applied and pushed to blead before continuing with the core Perl changes. Thank you for applying this, or using it as a basis for your own patch, and for supporting Normalize in general. Karl Williamson
Subject: 0001-Normalize.xs-Support-new-utf8n_to_uvuni-API.patch
From 32655e71829481e8d0a2665015a313e88371bd12 Mon Sep 17 00:00:00 2001 From: Karl Williamson <public@khwilliamson.com> Date: Sat, 27 Nov 2010 19:55:26 -0700 Subject: [PATCH] Normalize.xs: Support new utf8n_to_uvuni() API --- cpan/Unicode-Normalize/Normalize.xs | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/cpan/Unicode-Normalize/Normalize.xs b/cpan/Unicode-Normalize/Normalize.xs index f4bbca7..2115095 100644 --- a/cpan/Unicode-Normalize/Normalize.xs +++ b/cpan/Unicode-Normalize/Normalize.xs @@ -20,6 +20,13 @@ #define utf8n_to_uvuni utf8_to_uv #endif /* utf8n_to_uvuni */ +/* Starting in Perl 5.14, non-character code points are changed from disallow + * to allow by default, and the #define name is changed. So this turns or-ing + * it into a no-op */ +#ifdef UTF8_DISALLOW_NONCHAR +#define UTF8_ALLOW_FFFF 0 +#endif + /* UTF8_ALLOW_BOM is used before Perl 5.8.0 */ #ifdef UTF8_ALLOW_BOM #define AllowAnyUTF (UTF8_ALLOW_SURROGATE|UTF8_ALLOW_BOM|UTF8_ALLOW_FFFF) -- 1.5.6.3
Subject: Re: [rt.cpan.org #63393] Support new API in utf8n_to_uvuni()
Date: Sun, 28 Nov 2010 20:34:48 +0900
To: bug-Unicode-Normalize [...] rt.cpan.org
From: SADAHIRO Tomoyuki <bqw10602 [...] nifty.com>
Show quoted text
> The Perl 5.14 core is changing to handle the Unicode non-characters > properly. These are legal everywhere except for interchange between > applications, but Perl has treated them as illegal. Because of all > this, and the fact that Perl has it wrong as to which code points are > non-characters, the API is changing. As a result, Normalize needs to > change to use the new API. The attached patch does that, while > preserving compatibility with older Perls. Note that the behavior of > Normalize doesn't change, as it explicitly allowed these characters. > > FYI, also, utf8n_to_uvuni() has always allowed code points that are > higher than the Unicode maximum of U+10FFFF. The new API will add a > flag to disallow them if desired. > > So as to not break anything, I am waiting until this patch is applied > and pushed to blead before continuing with the core Perl changes. > > Thank you for applying this, or using it as a basis for your own patch, > and for supporting Normalize in general. > > Karl Williamson
As far as I understand, "first-come" for a dual-life module means that a patch can be applied to blead first, not waiting a new CPAN release of the module incorporating the patch. If any newer version of perl (say 5.13.8 or later) with your API change would have come, I should consider the patch for its CPAN-ization. I think it's not necessary to revise the dual-life module on CPAN until the core API change is to be confirmed. Please talk it with perl 5 porters first, about the core API change. Regards, SADAHIRO Tomoyuki.
Subject: Re: [rt.cpan.org #63393] Support new API in utf8n_to_uvuni()
Date: Sun, 28 Nov 2010 12:20:14 -0700
To: bug-Unicode-Normalize [...] rt.cpan.org
From: karl williamson <public [...] khwilliamson.com>
I'm so sorry to not have looked that it was first-come before I submitted this. I will make the changes myself. The proposal has been vetted already on p5p. So please close this ticket. SADAHIRO Tomoyuki via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=63393 > > >
>> The Perl 5.14 core is changing to handle the Unicode non-characters >> properly. These are legal everywhere except for interchange between >> applications, but Perl has treated them as illegal. Because of all >> this, and the fact that Perl has it wrong as to which code points are >> non-characters, the API is changing. As a result, Normalize needs to >> change to use the new API. The attached patch does that, while >> preserving compatibility with older Perls. Note that the behavior of >> Normalize doesn't change, as it explicitly allowed these characters. >> >> FYI, also, utf8n_to_uvuni() has always allowed code points that are >> higher than the Unicode maximum of U+10FFFF. The new API will add a >> flag to disallow them if desired. >> >> So as to not break anything, I am waiting until this patch is applied >> and pushed to blead before continuing with the core Perl changes. >> >> Thank you for applying this, or using it as a basis for your own patch, >> and for supporting Normalize in general. >> >> Karl Williamson
> > As far as I understand, "first-come" for a dual-life module means > that a patch can be applied to blead first, not waiting a new CPAN > release of the module incorporating the patch. > > If any newer version of perl (say 5.13.8 or later) with your API change > would have come, I should consider the patch for its CPAN-ization. > I think it's not necessary to revise the dual-life module on CPAN > until the core API change is to be confirmed. > > Please talk it with perl 5 porters first, about the core API change. > > Regards, > SADAHIRO Tomoyuki. > > >
RT-Send-CC: bqw10602 [...] nifty.com
I added to XS the following preprocessing directives so that the XS could be built even though UTF8_ALLOW_** might be undefined in future. /* UTF8_ALLOW_BOM is used before Perl 5.8.0 */ +#ifndef UTF8_ALLOW_BOM +#define UTF8_ALLOW_BOM (0) +#endif /* UTF8_ALLOW_BOM */ + +#ifndef UTF8_ALLOW_SURROGATE +#define UTF8_ALLOW_SURROGATE (0) +#endif /* UTF8_ALLOW_SURROGATE */ + +#ifndef UTF8_ALLOW_FE_FF +#define UTF8_ALLOW_FE_FF (0) +#endif /* UTF8_ALLOW_FE_FF */ + +#ifndef UTF8_ALLOW_FFFF +#define UTF8_ALLOW_FFFF (0) +#endif /* UTF8_ALLOW_FFFF */