Subject: | Support for UTF-8 |
With Gentoo Linux, ncurses-5.6, this Curses-1.15 and perl v5.8.8 for
i686-linux-thread-multi, I encountered some problems with a simple
curses application under a UTF-8 locale.
The attached program is a strippd down example; an editor for one line
(72 characters) of text. When I enter some German umlauts in a locale
with LC_CTYPE=de_DE.utf8, the blue background changes in size, cursor
movement breaks, and I see some strange characters on the screen. Some
of these effects might not appear immediately, but with some
experimenting (combinations of typing and cursor movement) I saw them all.
I found that exporting LD_PRELOAD=/lib/libncursesw.so made the program
work correctly. So I thought it would be nice to have Curses for perl
link against ncursesw instead of simply ncurses, so that it can use a
wide character aware version of the libraries. See also the
--enable-widec configuration flag for ncurses. I have never had anything
to do with MakeMaker, but I would hope that detecting the presence of
ncursesw and modifying the linker argument can be done automagically.
For most of what I wrote so far I also filed a feature request with the
Gentoo people: http://bugs.gentoo.org/show_bug.cgi?id=180257
However, with perls inbuild unicode support in mind, it might be
interesting to make functions like getch indeed character-oriented
instead of byte oriented. To move the idea of my u8getch in the attached
example into the Curses module. This would make developing portable
applications a lot easier, and should only brake exotic apps imho.
Subject: | test.pl |
#!/usr/bin/perl
use locale;
use Curses;
use POSIX qw(setlocale LC_CTYPE);
use I18N::Langinfo qw(langinfo CODESET);
use Encode;
sub u8getch() {
my $chr = getch();
my $cpt = ord($chr);
if ($chr =~ /\A\d+\z/ || $cpt <= 127 ||
langinfo(CODESET) !~ /utf[-._ ]?8/i) {
return $chr;
}
my $len=1;
if (($cpt & 0xe0) == 0xc0) { $len = 2; }
elsif (($cpt & 0xf0) == 0xe0) { $len = 3; }
elsif (($cpt & 0xf8) == 0xf0) { $len = 4; }
elsif (($cpt & 0xfc) == 0xf8) { $len = 5; }
elsif (($cpt & 0xfe) == 0xfc) { $len = 6; }
else {
return $chr;
}
for my $i (2..$len) {
my $chri = getch();
return $chri if ((ord($chri) & 0xc0) != 0x80);
$chr .= $chri;
}
return decode_utf8($chr);
}
sub show() {
attrset(COLOR_PAIR(0));
erase;
addstr(0, 0, $str);
attrset(COLOR_PAIR(1));
addstr(" " x (72 - length($str)));
move(0, $pos);
refresh;
}
sub ctrl($) {
return chr(ord(shift)+1-ord("A"))
}
sub inputLine() {
my $oldlocale = setlocale(LC_CTYPE, "");
initscr;
cbreak;
noecho;
keypad(1);
$str = "";
$pos = 0;
if (has_colors) {
start_color;
init_pair(1, COLOR_BLUE, COLOR_BLUE);
}
show;
for (;;) {
$chr=u8getch();
if ($chr eq "\n" || $chr eq "\r") { # commit with Enter
last;
}
if ($chr =~ /\A[\n[:print:]]\z/) { # append printable characters
substr ($str, $pos++, 0, $chr) if length($str) < 160;
}
elsif ($chr eq ctrl("C")) { # abort with ^C
$str = "";
last;
}
elsif ($chr eq KEY_LEFT) {
--$pos if $pos > 0;
}
elsif ($chr eq KEY_RIGHT) {
++$pos if $pos < length($str);
}
elsif ($chr eq KEY_BACKSPACE) {
substr ($str, --$pos, 1, "") if $pos > 0;
}
elsif ($chr eq KEY_DC) {
substr ($str, $pos, 1, "") if $pos < length($str);
}
show;
}
keypad(0);
echo;
nocbreak;
endwin;
setlocale(LC_CTYPE, $oldlocale);
return $str;
}
$_ = inputLine();
print "$_\n";