Subject: | mbwidth broken for many Unicode characters |
Demo program:
#!perl
use warnings FATAL => 'all';
use Data::Dump qw(pp);
use Text::CharWidth qw(mbwidth);
use Unicode::GCString qw();
use Unicode::UCD qw(charinfo);
for (0 .. 0x1ffff) {
my $c = eval sprintf '"\\x{%x}"', $_;
my $u = eval {Unicode::GCString->new($c)};
printf "%s <http://codepoints.net/U+%s> "%s" mbw %d col %d charinfo %s\n",
(mbwidth($c) == $u->columns) ? 'ok' : 'nok',
sprintf('%x', $_),
charinfo($_)->
mbwidth($c),
$u->columns,
pp(charinfo($_)),
if defined($u) && charinfo($_); # skip non-character codepoints
}
__END__
Invoked as
perl width.pl | ack ^nok | wc -l
shows 7623 disagreements on my installation of Perl 5.16.3 on glibc 2.17. As mbwidth tends to return -1 for many letter characters, I trust Unicode::GCString->columns to provide the correct answer. The documentation should at least mention this large disagreement, though it's probably better in the long term to deprecate Text::CharWidth altogether and redirect users to use Unicode::GCString instead.