Subject: | Makefile.PL generates invalid EastAsianWidth table |
Dear maintainer.
Makefile.PL has two bugs, so bundled Unicode::EastAsianWidth contains
invalid East Asian Width tables.
1) HEREDOC <<'END' does not recognize \t (tab) char
You changed HEREDOC terminator from <<END to <<'END',
so \p{...} does not have correct character range.
2) Parsing regex doesn't contain line-beginning match pattern
Character range (ex. 4E00..9FBB) couldn't be recognized correctly.
Attached patch works fine for me (and you have to re-generate
lib/Unicode/EastAsianWidth.pm).
One test script also attached.
(I'm not familiar with every East Asian Characters.
This test script contains only Japanese characters).
Regards.
--
BANB: ITO Nobuaki
Subject: | UnicodeEAW.patch |
--- Makefile.PL.orig 2007-10-14 17:02:39.000000000 +0900
+++ Makefile.PL 2007-12-25 11:47:35.000000000 +0900
@@ -66,7 +66,7 @@
my %categ;
while (<EAW>) {
- if (/(\w+);(\w+)/) {
+ if (/^(\w+);(\w+)/) {
my ($code, $categ) = ($1, $2);
if ($prev_categ ne $categ) {
$categ{$ToFullName{$prev_categ}} .= "$prev_code\\t$prev_code_end\n" if $prev_categ;
@@ -75,7 +75,7 @@
}
$prev_code_end = $code;
}
- elsif (/(\w+)\.\.(\w+);(\w+)/) {
+ elsif (/^(\w+)\.\.(\w+);(\w+)/) {
$categ{$ToFullName{$prev_categ}} .= "$prev_code\\t$prev_code_end\n" if $prev_categ;
$categ{$ToFullName{$3}} .= "$1\\t$2\n";
$prev_categ = '';
@@ -97,7 +97,7 @@
for my $name (sort values %ToFullName) {
$out .= << ".";
sub $name {
- return <<'END';
+ return <<"END";
$categ{$name}END
}
Subject: | 2-chars.t |
#!/usr/bin/perl -w
use strict;
use warnings;
use Test::Simple tests => 7;
use Unicode::EastAsianWidth;
# LATIN CAPITAL LETTER B
ok("B" =~ m/\p{InEastAsianNarrow}/, "East Asian Narrow");
# FULLWIDTH LATIN CAPITAL LETTER B
ok("\x{ff22}" =~ m/\p{InEastAsianFullwidth}/, "East Asian Full-width");
# HALFWIDTH KATAKANA LETTER I
ok("\x{ff72}" =~ m/\p{InEastAsianHalfwidth}/, "East Asian Half-width");
# KATAKANA LETTER I
ok("\x{30a4}" =~ m/\p{InEastAsianWide}/, "East Asian Wide");
# KANJI EI
ok("\x{6c38}" =~ m/\p{InEastAsianWide}/, "East Asian Wide");
# ROMAN NUMERAL FOUR
ok("\x{2163}" =~ m/\p{InEastAsianAmbiguous}/, "East Asian Ambiguous");
# THAI CHARACTER PHO SAMPHAO
ok("\x{0e20}" !~ m/\p{InEastAsianHalfwidth}/, "Not East Asian");
__END__