Skip Menu |

This queue is for tickets about the Text-Unidecode CPAN distribution.

Report information
The Basics
Id: 97456
Status: resolved
Worked: 20 min
Priority: 0/
Queue: Text-Unidecode

People
Owner: sburke [...] cpan.org
Requestors: ilmari+cpan [...] ilmari.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.01
Fixed in: 1.23



Subject: Fatal warning for UTF-16 surrogates on old perls
On perls 5.8.8 through 5.12.x, regex matches against UTF-16 surrogate characters emits a fatal "Malformed UTF-8 character" warning if warnings are enabled. ExtUtils::MakeMaker prior to 6.78 runs the test suite with -w, causing the installation to fail. The attached patch disables utf8 warnings while doing the regex substitution and converting the character number to a character in the test.
Subject: unidecode-no-utf8-warnings.patch
--- a/lib/Text/Unidecode.pm 2014-06-30 08:54:22.000000000 +0100 +++ b/lib/Text/Unidecode.pm 2014-07-24 11:36:21.555068805 +0100 @@ -44,6 +44,9 @@ foreach my $n (@_) { next unless defined $n; + # Shut up potentially fatal warnings about UTF-16 surrogate + # characters when running under perl -w + no warnings 'utf8'; $n =~ s~([^\x00-\x7f])~${$Char[ord($1)>>8]||t($1)}[ord($1)&255]~egs; } # Replace character 0xABCD with $Char[0xAB][0xCD], loading --- a/t/02000_uniform_table_sizes.t 2014-06-18 01:42:57.000000000 +0100 +++ b/t/02000_uniform_table_sizes.t 2014-07-24 11:35:36.034984852 +0100 @@ -24,7 +24,8 @@ Bank: foreach my $banknum ( @Bank_Numbers ) { my $charnum = $banknum << 8; - my $char = chr( $charnum ); + # Shut up warnings about UTF-16 surrogate characters + my $char = do { no warnings 'utf8'; chr( $charnum ) }; print "# About to test banknum $banknum via charnum $charnum\:\n";
Yup, this one just got me too. -- rjbs
And me.
Dang! I had a feeling that surrogates would eventually bite me somehow. I guess this is how. I'll try to bundle a new fixy-dist in the next four days or so. (Applying the patch to the relevant file is easy. Figuring out "now, where did I leave off in new versioning for Unicode.pm?", to *find* that relevant file-- that's a dispiriting idea, and it gets worse the more time I let pass, as the details of it fade in memory. I have only *just recently* read /Getting Things Done/, where I saw the importance of not just breaking things into steps (not revolutionary idea for me), but that when you have to stop the project at that step, make it clear how to *pick up the next step for when you get back to it*!-- Otherwise just the situation I described arises, with "all I remember is I was in the middle of something with that, but I don't know where and what files, and I don't have eleventy-thousand hours to spend on that just *today*!", and you know, waaaaaaaaambulance. But I now, hopefully, will be better with my personal "GTD_RESUME.txt" files my Unidecode makedisty directory. Also: a sensible dose of anti-anxiety meds.) And thank you very for the patch, ilmari. And I apologize to everyone that it's taken me this long to even notice the bug reports-- I must have bungled making a mail filter, so thank you to Tim Bunce for pinging me on this today.
Fixed in 1.23, but not mentioned here until now. Thanks, folks, I never would have caught this one.