Bug #118483 for Text-ASCIITable: t/03

Sun Oct 23 15:10:54 2016 emmanuel [...] seyman.fr - Ticket created

Subject:	t/03_options.t fails
Date:	Sun, 23 Oct 2016 21:10:37 +0200
To:	bug-Text-ASCIITable [...] rt.cpan.org
From:	Emmanuel Seyman <emmanuel [...] seyman.fr>

On Fedora 25 and 26, I now get a failure on t/03_options.t . [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | tee foo t/03_options.t .. 1..6 ok 1 ok 2 ok 3 ok 4 not ok 5 ok 6 Failed 1/6 subtests The failure happens on the comparaison between length($arr[0]) and $t->getTableWidth(). The first one is equal to 39 while the second is equal to 37 so the comparaison fails and so dies the test. On Fedora 24, both are equal to 37 and the tests pass. Changing the file encoding to UTF-8 doesn't improve things. Emmanuel

Mon Nov 21 14:54:54 2016 SREZIC [...] cpan.org - Correspondence added

RT-Send-CC:

DANKOGAI [...] cpan.org

On 2016-10-23 15:10:54, emmanuel@seyman.fr wrote: Show quoted text

> > On Fedora 25 and 26, I now get a failure on t/03_options.t . > > [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | tee foo > t/03_options.t .. > 1..6 > ok 1 > ok 2 > ok 3 > ok 4 > not ok 5 > ok 6 > Failed 1/6 subtests > > The failure happens on the comparaison between length($arr[0]) > and $t->getTableWidth(). The first one is equal to 39 while the second > is equal to 37 so the comparaison fails and so dies the test. > On Fedora 24, both are equal to 37 and the tests pass. > > Changing the file encoding to UTF-8 doesn't improve things.

It seems that the failure happens with newer Encode.pm (2.87). Statistical analysis suggests so (theta=-1.0 means: only fail reports): **************************************************************** Regression 'mod:Encode' **************************************************************** Name Theta StdErr T-stat [0='const'] 1.0000 0.0000 20771296082507892.00 [1='eq_2.39'] -0.0000 0.0000 -3.21 [2='eq_2.42_01'] -0.0000 0.0000 -3.21 [3='eq_2.44_01'] -0.0000 0.0000 -1.97 [4='eq_2.49'] -0.0000 0.0000 -4.05 [5='eq_2.55'] -0.0000 0.0000 -3.95 [6='eq_2.56'] -0.0000 0.0000 -3.50 [7='eq_2.57'] -0.0000 0.0000 -3.96 [8='eq_2.60'] -0.0000 0.0000 -4.29 [9='eq_2.62'] -0.0000 0.0000 -4.37 [10='eq_2.63'] -0.0000 0.0000 -4.03 [11='eq_2.64'] -0.0000 0.0000 -3.65 [12='eq_2.67'] -0.0000 0.0000 -4.33 [13='eq_2.68'] -0.0000 0.0000 -3.16 [14='eq_2.69'] -0.0000 0.0000 -3.59 [15='eq_2.70'] -0.0000 0.0000 -3.96 [16='eq_2.72'] -0.0000 0.0000 -4.26 [17='eq_2.72_01'] -0.0000 0.0000 -3.93 [18='eq_2.73'] -0.0000 0.0000 -3.01 [19='eq_2.75'] -0.0000 0.0000 -3.16 [20='eq_2.76'] -0.0000 0.0000 -4.08 [21='eq_2.78'] -0.0000 0.0000 -4.28 [22='eq_2.79'] -0.0000 0.0000 -3.11 [23='eq_2.80'] -0.0000 0.0000 -4.33 [24='eq_2.80_01'] -0.0000 0.0000 -4.12 [25='eq_2.84'] -0.0000 0.0000 -4.25 [26='eq_2.86'] -0.0000 0.0000 -4.49 [27='eq_2.87'] -1.0000 0.0000 -16959692232805172.00 R^2= 1.000, N= 226, K= 28 ****************************************************************

Mon Nov 21 14:54:54 2016 The RT System itself - Status changed from 'new' to 'open'

Tue Nov 22 02:28:15 2016 haakon [...] _NOSPAM_loopback.no - Correspondence added

ma. 21. nov. 2016 14.54.54 skrev SREZIC: Show quoted text

> On 2016-10-23 15:10:54, emmanuel@seyman.fr wrote:

> > > > On Fedora 25 and 26, I now get a failure on t/03_options.t . > > > > [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | > > tee foo > > t/03_options.t .. > > 1..6 > > ok 1 > > ok 2 > > ok 3 > > ok 4 > > not ok 5 > > ok 6 > > Failed 1/6 subtests > > > > The failure happens on the comparaison between length($arr[0]) > > and $t->getTableWidth(). The first one is equal to 39 while the > > second > > is equal to 37 so the comparaison fails and so dies the test. > > On Fedora 24, both are equal to 37 and the tests pass. > > > > Changing the file encoding to UTF-8 doesn't improve things.

> > It seems that the failure happens with newer Encode.pm (2.87). > Statistical analysis suggests so (theta=-1.0 means: only fail > reports): > > **************************************************************** > Regression 'mod:Encode' > **************************************************************** > Name Theta StdErr T-stat > [0='const'] 1.0000 0.0000 20771296082507892.00 > [1='eq_2.39'] -0.0000 0.0000 -3.21 > [2='eq_2.42_01'] -0.0000 0.0000 -3.21 > [3='eq_2.44_01'] -0.0000 0.0000 -1.97 > [4='eq_2.49'] -0.0000 0.0000 -4.05 > [5='eq_2.55'] -0.0000 0.0000 -3.95 > [6='eq_2.56'] -0.0000 0.0000 -3.50 > [7='eq_2.57'] -0.0000 0.0000 -3.96 > [8='eq_2.60'] -0.0000 0.0000 -4.29 > [9='eq_2.62'] -0.0000 0.0000 -4.37 > [10='eq_2.63'] -0.0000 0.0000 -4.03 > [11='eq_2.64'] -0.0000 0.0000 -3.65 > [12='eq_2.67'] -0.0000 0.0000 -4.33 > [13='eq_2.68'] -0.0000 0.0000 -3.16 > [14='eq_2.69'] -0.0000 0.0000 -3.59 > [15='eq_2.70'] -0.0000 0.0000 -3.96 > [16='eq_2.72'] -0.0000 0.0000 -4.26 > [17='eq_2.72_01'] -0.0000 0.0000 -3.93 > [18='eq_2.73'] -0.0000 0.0000 -3.01 > [19='eq_2.75'] -0.0000 0.0000 -3.16 > [20='eq_2.76'] -0.0000 0.0000 -4.08 > [21='eq_2.78'] -0.0000 0.0000 -4.28 > [22='eq_2.79'] -0.0000 0.0000 -3.11 > [23='eq_2.80'] -0.0000 0.0000 -4.33 > [24='eq_2.80_01'] -0.0000 0.0000 -4.12 > [25='eq_2.84'] -0.0000 0.0000 -4.25 > [26='eq_2.86'] -0.0000 0.0000 -4.49 > [27='eq_2.87'] -1.0000 0.0000 -16959692232805172.00 > > R^2= 1.000, N= 226, K= 28 > ****************************************************************

Thank you for the very thorough research! Could you show me a diff between $arr[0] from your two different test setups?

Tue Nov 22 02:28:15 2016 haakon [...] _NOSPAM_loopback.no - Taken

Sun Nov 27 05:13:07 2016 emmanuel [...] seyman.fr - Correspondence added

Subject:	Re: [rt.cpan.org #118483] t/03_options.t fails
Date:	Sun, 27 Nov 2016 11:12:52 +0100
To:	Hakon Nessjoen via RT <bug-Text-ASCIITable [...] rt.cpan.org>
From:	Emmanuel Seyman <emmanuel [...] seyman.fr>

* Hakon Nessjoen via RT [22/11/2016 02:28] : Show quoted text

> > Thank you for the very thorough research! Could you show me a diff > between $arr[0] from your two different test setups?

Thu Dec 22 07:07:17 2016 ppisar [...] redhat.com - Correspondence added

From:

ppisar [...] redhat.com

Dne Ne 27.lis.2016 05:13:07, emmanuel@seyman.fr napsal(a): Show quoted text

> On Fedora 26 (perl 5.24, perl-Encode 2.87), I get:

[...] Show quoted text

The length() value is correct, the getTableWidth() value is wrong. The "Håkon Nessjøen" text is wrongly padded from right side, there are two additional spaces as can be seen when printing the whole table: | 1 | Lunatic-| | Håkon Nessjøen | | 2 | tesepe | William Viker | | 3 | espen | Espen Ursin-Holm | | 4 | bonde | Martin Mikkelsen | I think the bug is in Text::ASCIITable::count() that counts number of characters and it does these last two commands: $str = decode("utf8", $str) if $self->{options}{utf8}; return length($str); In other words it decodes Unicode string again into UTF-8 because $self->{options}{utf8} is true by default. The double UTF-8 interpretation can be clearly visible if I change "Håkon Nessjøen" addRow() argument to "žHåkon Nessjøen", then the test dies: perl -Ilib t/03_options.t 1..6 ok 1 ok 2 Wide character at /usr/lib64/perl5/vendor_perl/Encode.pm line 212. The Encode.pm:212 is a line in Encode::decode() from Encode-2.88 with this code: $string = $enc->decode( $octets, $check ); Now, why does the "ž" character matter? I don't know. The only difference is that "ž" code point is bigger than 255 while "å" and "ø" are less that 256.

Thu Dec 22 07:39:27 2016 ppisar [...] redhat.com - Correspondence added

From:

ppisar [...] redhat.com

Dne Čt 22.pro.2016 07:07:17, ppisar napsal(a): Show quoted text

> I think the bug is in Text::ASCIITable::count() that counts number of > characters and it does these last two commands: > > $str = decode("utf8", $str) if $self->{options}{utf8}; > return length($str); >

There is t/13_utf8.t test that expects exactly opposite results. It looks like Text::ASCIITable interface expects octet strings on input. I think this is terribly bad idea. Especially in these days. Attached patch removes the undocumented utf8 feature and changes expectations that all inputs are Unicode strings.

Subject:

0001-Remove-utf8-option.patch

From dcd0d8011eeaf7f74a8f03e3600235303e64f7d7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= <ppisar@redhat.com> Date: Thu, 22 Dec 2016 13:12:48 +0100 Subject: [PATCH] Remove utf8 option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This feature caused test failures since Encode-2.87 because it tried to reinterpret input strings as UTF-8 octet stream. It's application's responsibility to provide input as Unicode strings. This how Perl works. CPAN RT#118483 Signed-off-by: Petr PÃsaÅ <ppisar@redhat.com> --- lib/Text/ASCIITable.pm | 2 -- t/03_options.t | 3 ++- t/13_utf8.t | 5 +++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/Text/ASCIITable.pm b/lib/Text/ASCIITable.pm index b630910..c4449e0 100644 --- a/lib/Text/ASCIITable.pm +++ b/lib/Text/ASCIITable.pm @@ -88,7 +88,6 @@ sub new { $self->{options}{alignHeadRow} = $self->{options}{alignHeadRow} || 'auto'; # default setting $self->{options}{undef_as} = $self->{options}{undef_as} || ''; # default setting $self->{options}{chaining} = $self->{options}{chaining} || 0; # default setting - $self->{options}{utf8} = defined($self->{options}{utf8}) ? $self->{options}{utf8} : 1; # default setting bless $self; @@ -884,7 +883,6 @@ sub count { $str =~ s/<.+?>//g if $self->{options}{allowHTML}; $str =~ s/\33\[(\d+(;\d+)?)?[musfwhojBCDHRJK]//g if $self->{options}{allowANSI}; # maybe i should only have allowed ESC[#;#m and not things not related to $str =~ s/\33\([0B]//g if $self->{options}{allowANSI}; # color/bold/underline.. But I want to give people as much room as they need. - $str = decode("utf8", $str) if $self->{options}{utf8}; return length($str); } diff --git a/t/03_options.t b/t/03_options.t index d8ef530..162c286 100644 --- a/t/03_options.t +++ b/t/03_options.t @@ -2,6 +2,7 @@ BEGIN { $| = 1; print "1..6\n"; } END {print "not ok 1\n" unless $loaded;} +use utf8; use Text::ASCIITable; $loaded = 1; print "ok 1\n"; @@ -9,7 +10,7 @@ $i=2; $t = new Text::ASCIITable({ hide_LastLine => 1, hide_HeadLine => 1 }); ok($t->setCols(['id','nick','name'])); -ok($t->addRow('1','Lunatic-|','Håkon Nessjøen')); +ok($t->addRow('1','Lunatic-|','HÃ¥kon NessjÃ¸en')); $t->addRow('2','tesepe','William Viker'); $t->addRow('3','espen','Espen Ursin-Holm'); $t->addRow('4','bonde','Martin Mikkelsen'); diff --git a/t/13_utf8.t b/t/13_utf8.t index d0ef858..59649f2 100644 --- a/t/13_utf8.t +++ b/t/13_utf8.t @@ -2,6 +2,7 @@ BEGIN { $| = 1; print "1..6\n"; } END {print "not ok 1\n" unless $loaded;} +use utf8; use Text::ASCIITable; $loaded = 1; print "ok 1\n"; @@ -18,9 +19,9 @@ eval { }; if (!$@) {ok(undef)} else {ok(1)} @arr = split(/\n/,$content); -ok(length($arr[3]) < length($arr[4])?undef:1); +ok(length($arr[3]) == length($arr[4])?undef:1); ok(length($arr[3]) == $t->getTableWidth()?undef:1); -ok(length($arr[6]) > $t->getTableWidth()?undef:1); +ok(length($arr[6]) == $t->getTableWidth()?undef:1); if (scalar(@arr) == 8) {ok(undef);} else {ok(1);} sub ok{print(defined(shift)?"not ok $i\n":"ok $i\n");$i++;} -- 2.7.4

Thu Dec 29 12:08:48 2016 haakon [...] _NOSPAM_loopback.no - Correspondence added

Fixed in 0.22

Thu Dec 29 12:08:59 2016 haakon [...] _NOSPAM_loopback.no - Status changed from 'open' to 'resolved'

Bug #118483 for Text-ASCIITable: t/03_options.t fails