Skip Menu |

This queue is for tickets about the Unicode-Char CPAN distribution.

Report information
The Basics
Id: 132471
Status: resolved
Priority: 0/
Queue: Unicode-Char

People
Owner: Nobody in particular
Requestors: khw [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Will not work in 5.32 going forward
This module reads lib/unicore/Name.pl. That file is an core-perl-internal file, and its format has changed, so the module now fails. Since Perl v5.16, there has been an alternative method for getting this information. And the recipe to do so is already written out for you., perldoc Unicode::UCD look for the "Getting every available name" description. Here it is for your convenience my (%name, %cp, %cps, $n); # All codepoints foreach my $cat (qw( Name Name_Alias )) { my ($codepoints, $names, $format, $default) = prop_invmap($cat); # $format => "n", $default => "" foreach my $i (0 .. @$codepoints - 2) { my ($cp, $n) = ($codepoints->[$i], $names->[$i]); # If $n is a ref, the same codepoint has multiple names foreach my $name (ref $n ? @$n : $n) { $name{$cp} //= $name; $cp{$name} //= $cp; } } } # Named sequences { my %ns = namedseq(); foreach my $name (sort { $ns{$a} cmp $ns{$b} } keys %ns) { $cp{$name} //= [ map { ord } split "" => $ns{$name} ]; } } It may be there are bugs in the earlier versions of Unicode::UCD. Or there might not be. But you could do the Name.pl thing on versions 5.30 and earlier, and the above recipe for later.
Subject: Re: [rt.cpan.org #132471]
Date: Fri, 1 May 2020 10:28:44 -0600
To: bug-Unicode-Char [...] rt.cpan.org
From: Karl Williamson <khw [...] cpan.org>
Grepping CPAN, it appears Encode also reads Name.pl
I took a look at the repository and found reading lib/unicore/Name.pl has already been useless. whoa! * it is only used by bin/enc2xs * in there it is used to init @uname (by &char_names) where $uname[$xxxx] returns the name of the U+xxxx * and it adds the name to the ucm **COMMENT** if available. as a matter of fact char_names() is already a noop! Run the code below and see what happens. use strict; use warnings; use Data::Dumper; my @uname; sub char_names { my $s = do "unicore/Name.pl"; die "char_names: unicore/Name.pl: $!\n" unless defined $s; pos($s) = 0; while ($s =~ /\G([0-9a-f]+)\t([0-9a-f]*)\t(.*?)\s*\n/igc) { my $name = $3; my $s = hex($1); last if $s >= 0x10000; my $e = length($2) ? hex($2) : $s; for (my $i = $s; $i <= $e; $i++) { $uname[$i] = $name; #print sprintf("U%04X $name\n",$i); } } } char_names(); warn Dumper \@uname; So it is safe to remove the definition and usage of &char_names. Dan the Maintainer Thereof On Fri May 01 11:42:24 2020, khw wrote: Show quoted text
> This module reads lib/unicore/Name.pl. That file is an core-perl- > internal file, and its format has changed, so the module now fails. > Since Perl v5.16, there has been an alternative method for getting > this information. And the recipe to do so is already written out for > you., > > perldoc Unicode::UCD > > look for the "Getting every available name" description. Here it is > for your convenience > > my (%name, %cp, %cps, $n); > # All codepoints > foreach my $cat (qw( Name Name_Alias )) { > my ($codepoints, $names, $format, $default) = prop_invmap($cat); > # $format => "n", $default => "" > foreach my $i (0 .. @$codepoints - 2) { > my ($cp, $n) = ($codepoints->[$i], $names->[$i]); > # If $n is a ref, the same codepoint has multiple names > foreach my $name (ref $n ? @$n : $n) { > $name{$cp} //= $name; > $cp{$name} //= $cp; > } > } > } > # Named sequences > { my %ns = namedseq(); > foreach my $name (sort { $ns{$a} cmp $ns{$b} } keys %ns) { > $cp{$name} //= [ map { ord } split "" => $ns{$name} ]; > } > } > > It may be there are bugs in the earlier versions of Unicode::UCD. Or > there might not be. But you could do the Name.pl thing on versions > 5.30 and earlier, and the above recipe for later.
I have made &char_names of bin/enc2xs officially a no-op. sub char_names{} # cf. https://rt.cpan.org/Ticket/Display.html?id=132471 and VERSION++'ed. Dan the Maintainer Thereof
On Fri May 01 22:38:07 2020, DANKOGAI wrote: Show quoted text
> I have made &char_names of bin/enc2xs officially a no-op. > > sub char_names{} # cf. https://rt.cpan.org/Ticket/Display.html?id=132471 > > and VERSION++'ed. > > Dan the Maintainer Thereof8ced1423dbb2a874f2d95e9c5c4c46960c2bf318
Thanks. Encode v3.06 merged into perl 5 blead 8ced1423dbb2a874f2d95e9c5c4c46960c2bf318.