Skip Menu |

This queue is for tickets about the Text-ASCIITable CPAN distribution.

Report information
The Basics
Id: 118483
Status: resolved
Priority: 0/
Queue: Text-ASCIITable

People
Owner: haakon [...] _NOSPAM_loopback.no
Requestors: emmanuel [...] seyman.fr
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: t/03_options.t fails
Date: Sun, 23 Oct 2016 21:10:37 +0200
To: bug-Text-ASCIITable [...] rt.cpan.org
From: Emmanuel Seyman <emmanuel [...] seyman.fr>
On Fedora 25 and 26, I now get a failure on t/03_options.t . [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | tee foo t/03_options.t .. 1..6 ok 1 ok 2 ok 3 ok 4 not ok 5 ok 6 Failed 1/6 subtests The failure happens on the comparaison between length($arr[0]) and $t->getTableWidth(). The first one is equal to 39 while the second is equal to 37 so the comparaison fails and so dies the test. On Fedora 24, both are equal to 37 and the tests pass. Changing the file encoding to UTF-8 doesn't improve things. Emmanuel
RT-Send-CC: DANKOGAI [...] cpan.org
On 2016-10-23 15:10:54, emmanuel@seyman.fr wrote: Show quoted text
> > On Fedora 25 and 26, I now get a failure on t/03_options.t . > > [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | tee foo > t/03_options.t .. > 1..6 > ok 1 > ok 2 > ok 3 > ok 4 > not ok 5 > ok 6 > Failed 1/6 subtests > > The failure happens on the comparaison between length($arr[0]) > and $t->getTableWidth(). The first one is equal to 39 while the second > is equal to 37 so the comparaison fails and so dies the test. > On Fedora 24, both are equal to 37 and the tests pass. > > Changing the file encoding to UTF-8 doesn't improve things.
It seems that the failure happens with newer Encode.pm (2.87). Statistical analysis suggests so (theta=-1.0 means: only fail reports): **************************************************************** Regression 'mod:Encode' **************************************************************** Name Theta StdErr T-stat [0='const'] 1.0000 0.0000 20771296082507892.00 [1='eq_2.39'] -0.0000 0.0000 -3.21 [2='eq_2.42_01'] -0.0000 0.0000 -3.21 [3='eq_2.44_01'] -0.0000 0.0000 -1.97 [4='eq_2.49'] -0.0000 0.0000 -4.05 [5='eq_2.55'] -0.0000 0.0000 -3.95 [6='eq_2.56'] -0.0000 0.0000 -3.50 [7='eq_2.57'] -0.0000 0.0000 -3.96 [8='eq_2.60'] -0.0000 0.0000 -4.29 [9='eq_2.62'] -0.0000 0.0000 -4.37 [10='eq_2.63'] -0.0000 0.0000 -4.03 [11='eq_2.64'] -0.0000 0.0000 -3.65 [12='eq_2.67'] -0.0000 0.0000 -4.33 [13='eq_2.68'] -0.0000 0.0000 -3.16 [14='eq_2.69'] -0.0000 0.0000 -3.59 [15='eq_2.70'] -0.0000 0.0000 -3.96 [16='eq_2.72'] -0.0000 0.0000 -4.26 [17='eq_2.72_01'] -0.0000 0.0000 -3.93 [18='eq_2.73'] -0.0000 0.0000 -3.01 [19='eq_2.75'] -0.0000 0.0000 -3.16 [20='eq_2.76'] -0.0000 0.0000 -4.08 [21='eq_2.78'] -0.0000 0.0000 -4.28 [22='eq_2.79'] -0.0000 0.0000 -3.11 [23='eq_2.80'] -0.0000 0.0000 -4.33 [24='eq_2.80_01'] -0.0000 0.0000 -4.12 [25='eq_2.84'] -0.0000 0.0000 -4.25 [26='eq_2.86'] -0.0000 0.0000 -4.49 [27='eq_2.87'] -1.0000 0.0000 -16959692232805172.00 R^2= 1.000, N= 226, K= 28 ****************************************************************
ma. 21. nov. 2016 14.54.54 skrev SREZIC: Show quoted text
> On 2016-10-23 15:10:54, emmanuel@seyman.fr wrote:
> > > > On Fedora 25 and 26, I now get a failure on t/03_options.t . > > > > [manu@kala Text-ASCIITable-0.20]$ prove -Ilib -v t/03_options.t | > > tee foo > > t/03_options.t .. > > 1..6 > > ok 1 > > ok 2 > > ok 3 > > ok 4 > > not ok 5 > > ok 6 > > Failed 1/6 subtests > > > > The failure happens on the comparaison between length($arr[0]) > > and $t->getTableWidth(). The first one is equal to 39 while the > > second > > is equal to 37 so the comparaison fails and so dies the test. > > On Fedora 24, both are equal to 37 and the tests pass. > > > > Changing the file encoding to UTF-8 doesn't improve things.
> > It seems that the failure happens with newer Encode.pm (2.87). > Statistical analysis suggests so (theta=-1.0 means: only fail > reports): > > **************************************************************** > Regression 'mod:Encode' > **************************************************************** > Name Theta StdErr T-stat > [0='const'] 1.0000 0.0000 20771296082507892.00 > [1='eq_2.39'] -0.0000 0.0000 -3.21 > [2='eq_2.42_01'] -0.0000 0.0000 -3.21 > [3='eq_2.44_01'] -0.0000 0.0000 -1.97 > [4='eq_2.49'] -0.0000 0.0000 -4.05 > [5='eq_2.55'] -0.0000 0.0000 -3.95 > [6='eq_2.56'] -0.0000 0.0000 -3.50 > [7='eq_2.57'] -0.0000 0.0000 -3.96 > [8='eq_2.60'] -0.0000 0.0000 -4.29 > [9='eq_2.62'] -0.0000 0.0000 -4.37 > [10='eq_2.63'] -0.0000 0.0000 -4.03 > [11='eq_2.64'] -0.0000 0.0000 -3.65 > [12='eq_2.67'] -0.0000 0.0000 -4.33 > [13='eq_2.68'] -0.0000 0.0000 -3.16 > [14='eq_2.69'] -0.0000 0.0000 -3.59 > [15='eq_2.70'] -0.0000 0.0000 -3.96 > [16='eq_2.72'] -0.0000 0.0000 -4.26 > [17='eq_2.72_01'] -0.0000 0.0000 -3.93 > [18='eq_2.73'] -0.0000 0.0000 -3.01 > [19='eq_2.75'] -0.0000 0.0000 -3.16 > [20='eq_2.76'] -0.0000 0.0000 -4.08 > [21='eq_2.78'] -0.0000 0.0000 -4.28 > [22='eq_2.79'] -0.0000 0.0000 -3.11 > [23='eq_2.80'] -0.0000 0.0000 -4.33 > [24='eq_2.80_01'] -0.0000 0.0000 -4.12 > [25='eq_2.84'] -0.0000 0.0000 -4.25 > [26='eq_2.86'] -0.0000 0.0000 -4.49 > [27='eq_2.87'] -1.0000 0.0000 -16959692232805172.00 > > R^2= 1.000, N= 226, K= 28 > ****************************************************************
Thank you for the very thorough research! Could you show me a diff between $arr[0] from your two different test setups?
Subject: Re: [rt.cpan.org #118483] t/03_options.t fails
Date: Sun, 27 Nov 2016 11:12:52 +0100
To: Hakon Nessjoen via RT <bug-Text-ASCIITable [...] rt.cpan.org>
From: Emmanuel Seyman <emmanuel [...] seyman.fr>
* Hakon Nessjoen via RT [22/11/2016 02:28] : Show quoted text
> > Thank you for the very thorough research! Could you show me a diff > between $arr[0] from your two different test setups?
On a Fedora 24 system with updates (perl 5.22, perl-Encode 2.84): $arr[0] = '| 1 | Lunatic-| | H�kon Nessj�en |'; length($arr[0]) = 37; $t->getTableWidth() = 37; If I convert t/03_options.t to UTF-8, I get: $arr[0] = '| 1 | Lunatic-| | Håkon Nessjøen |'; length($arr[0]) = 39; $t->getTableWidth() = 37; On Fedora 26 (perl 5.24, perl-Encode 2.87), I get: $arr[0] = '| 1 | Lunatic-| | H�kon Nessj�en |'; length($arr[0]) = 39; $t->getTableWidth() = 37; Converting t/03_options.t to UTF-8, I get: $arr[0] = '| 1 | Lunatic-| | Håkon Nessjøen |'; length($arr[0]) = 39; $t->getTableWidth() = 37; Emmanuel
From: ppisar [...] redhat.com
Dne Ne 27.lis.2016 05:13:07, emmanuel@seyman.fr napsal(a): Show quoted text
> On Fedora 26 (perl 5.24, perl-Encode 2.87), I get:
[...] Show quoted text
> Converting t/03_options.t to UTF-8, I get: > > $arr[0] = '| 1 | Lunatic-| | Håkon Nessjøen |'; > length($arr[0]) = 39; > $t->getTableWidth() = 37; >
The length() value is correct, the getTableWidth() value is wrong. The "Håkon Nessjøen" text is wrongly padded from right side, there are two additional spaces as can be seen when printing the whole table: | 1 | Lunatic-| | Håkon Nessjøen | | 2 | tesepe | William Viker | | 3 | espen | Espen Ursin-Holm | | 4 | bonde | Martin Mikkelsen | I think the bug is in Text::ASCIITable::count() that counts number of characters and it does these last two commands: $str = decode("utf8", $str) if $self->{options}{utf8}; return length($str); In other words it decodes Unicode string again into UTF-8 because $self->{options}{utf8} is true by default. The double UTF-8 interpretation can be clearly visible if I change "Håkon Nessjøen" addRow() argument to "žHåkon Nessjøen", then the test dies: perl -Ilib t/03_options.t 1..6 ok 1 ok 2 Wide character at /usr/lib64/perl5/vendor_perl/Encode.pm line 212. The Encode.pm:212 is a line in Encode::decode() from Encode-2.88 with this code: $string = $enc->decode( $octets, $check ); Now, why does the "ž" character matter? I don't know. The only difference is that "ž" code point is bigger than 255 while "å" and "ø" are less that 256.
From: ppisar [...] redhat.com
Dne Čt 22.pro.2016 07:07:17, ppisar napsal(a): Show quoted text
> I think the bug is in Text::ASCIITable::count() that counts number of > characters and it does these last two commands: > > $str = decode("utf8", $str) if $self->{options}{utf8}; > return length($str); >
There is t/13_utf8.t test that expects exactly opposite results. It looks like Text::ASCIITable interface expects octet strings on input. I think this is terribly bad idea. Especially in these days. Attached patch removes the undocumented utf8 feature and changes expectations that all inputs are Unicode strings.
Subject: 0001-Remove-utf8-option.patch
From dcd0d8011eeaf7f74a8f03e3600235303e64f7d7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= <ppisar@redhat.com> Date: Thu, 22 Dec 2016 13:12:48 +0100 Subject: [PATCH] Remove utf8 option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This feature caused test failures since Encode-2.87 because it tried to reinterpret input strings as UTF-8 octet stream. It's application's responsibility to provide input as Unicode strings. This how Perl works. CPAN RT#118483 Signed-off-by: Petr Písař <ppisar@redhat.com> --- lib/Text/ASCIITable.pm | 2 -- t/03_options.t | 3 ++- t/13_utf8.t | 5 +++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/Text/ASCIITable.pm b/lib/Text/ASCIITable.pm index b630910..c4449e0 100644 --- a/lib/Text/ASCIITable.pm +++ b/lib/Text/ASCIITable.pm @@ -88,7 +88,6 @@ sub new { $self->{options}{alignHeadRow} = $self->{options}{alignHeadRow} || 'auto'; # default setting $self->{options}{undef_as} = $self->{options}{undef_as} || ''; # default setting $self->{options}{chaining} = $self->{options}{chaining} || 0; # default setting - $self->{options}{utf8} = defined($self->{options}{utf8}) ? $self->{options}{utf8} : 1; # default setting bless $self; @@ -884,7 +883,6 @@ sub count { $str =~ s/<.+?>//g if $self->{options}{allowHTML}; $str =~ s/\33\[(\d+(;\d+)?)?[musfwhojBCDHRJK]//g if $self->{options}{allowANSI}; # maybe i should only have allowed ESC[#;#m and not things not related to $str =~ s/\33\([0B]//g if $self->{options}{allowANSI}; # color/bold/underline.. But I want to give people as much room as they need. - $str = decode("utf8", $str) if $self->{options}{utf8}; return length($str); } diff --git a/t/03_options.t b/t/03_options.t index d8ef530..162c286 100644 --- a/t/03_options.t +++ b/t/03_options.t @@ -2,6 +2,7 @@ BEGIN { $| = 1; print "1..6\n"; } END {print "not ok 1\n" unless $loaded;} +use utf8; use Text::ASCIITable; $loaded = 1; print "ok 1\n"; @@ -9,7 +10,7 @@ $i=2; $t = new Text::ASCIITable({ hide_LastLine => 1, hide_HeadLine => 1 }); ok($t->setCols(['id','nick','name'])); -ok($t->addRow('1','Lunatic-|','Håkon Nessjøen')); +ok($t->addRow('1','Lunatic-|','HÃ¥kon Nessjøen')); $t->addRow('2','tesepe','William Viker'); $t->addRow('3','espen','Espen Ursin-Holm'); $t->addRow('4','bonde','Martin Mikkelsen'); diff --git a/t/13_utf8.t b/t/13_utf8.t index d0ef858..59649f2 100644 --- a/t/13_utf8.t +++ b/t/13_utf8.t @@ -2,6 +2,7 @@ BEGIN { $| = 1; print "1..6\n"; } END {print "not ok 1\n" unless $loaded;} +use utf8; use Text::ASCIITable; $loaded = 1; print "ok 1\n"; @@ -18,9 +19,9 @@ eval { }; if (!$@) {ok(undef)} else {ok(1)} @arr = split(/\n/,$content); -ok(length($arr[3]) < length($arr[4])?undef:1); +ok(length($arr[3]) == length($arr[4])?undef:1); ok(length($arr[3]) == $t->getTableWidth()?undef:1); -ok(length($arr[6]) > $t->getTableWidth()?undef:1); +ok(length($arr[6]) == $t->getTableWidth()?undef:1); if (scalar(@arr) == 8) {ok(undef);} else {ok(1);} sub ok{print(defined(shift)?"not ok $i\n":"ok $i\n");$i++;} -- 2.7.4
Fixed in 0.22