On Thu May 20 22:58:54 2010, bryanh@giraffe-data.com wrote:
Show quoted text> The W libraries are ones that understand wide characters (I.e. character
> sets with more than 128 characters, which use more than one byte to
> represent a character.
>
> The W libraries are fairly recent, and Curses.pm interoperability with
them
Show quoted text> even newer. AFAIR, the only changes that have been made to Perl Curses to
> work with the W libraries were in the building.
>
> How sure are you that the splitting happens at a line boundary? What if
> there are 3 lines? I'm wondering if it's a length problem, e.g. there
> are 26 one-byte characters and something thinks there are 2 bytes per
> character and thus concludes the string length is 13.
>
> Note that it's not hard to build Curses.so with the more primitive
(and more
Show quoted text> tested) non-W versions. It's just not what the automatic configurator
> prefers.
>
Greetings Bryan - thank you also for taking the time to reply. I know
this is not a Perl Curses issue, and I'm sure you have many other
demands on your time! It is very kind of you to reply here to help me
and anybody else who may encounter this issue.
Yes, after making my last post, I went off in search of bugs in the
ncurses libraries and did notice in the notes on the distribution that
the W libraries are for the the multi-byte characters. It now makes
perfect sense why the problem is seen there and not in the original
libraries. Although there have been other bugs relating to
field_buffer, it seems most of these had to do with "wide" characters,
and I could find none that seemed to have issues with a row boundary.
So I've emailed a report to the ncurses maintainer, and perhaps he may
be able to quickly find the problem.
Once I realized that the libraries were for "wide" characters and had
nothing to do with 32 vs 64-bit, I investigated build options on Gentoo
for the Curses Perl module. Fortunately, the Gentoo package maintainer
has conveniently included a "unicode" build option. So I rebuilt the
package without unicode support for this specific module. On Gentoo
this is easily done as follows:
USE=-unicode emerge ncurses
The above sets environment variable "USE" to no unicode ("-unicode") and
then rebuild the package ("emerge ncurses"). This rebuilds the package
without unicode support - in other words, linking to the "narrow"
libraries, and thus giving me a workaround.
While this is an effective workaround in my specific case this, of
course, will not work for somebody who really needs "wide" character
support. In that case, it seems there is no other option but to wait
for a bug fix for the ncurses library.
As far as my certainty that this always occurs at a row boundary, I did
fairly extensive "testing" trying to coax a multi-row value out of a
multi-row field. My application is a database app, and I ran into the
problem when trying to support 255 character database fields by using
4x64 byte (multi-row) fields. My database was already populated with
data of various sizes, but I also tried various combinations of options
such as changing the number of rows, the number of off-screen rows, the
column width, and then examining the number and type of return values,
and even examining the additional (user defined) buffers in the field.
The only thing that seems to hold constant is the size of the return
value is always the number of columns (width - it should be width*(rows
+ offscreen rows)).
As far as my test case, I chose the letters of the alphabet and two 13
character rows just because it conveniently breaks at the halfway point,
and I don't introduce any questions about parsing trailing spaces as the
cause of the problem. However, your question points out that a better
test case might be a 3 or 4 row example, so that it is clear that only
the first row of a multi-row field is returned, and that it doesn't
appear to be a halving or doubling issue associated with multi-byte
characters.