Skip Menu |

This queue is for tickets about the File-BOM CPAN distribution.

Report information
The Basics
Id: 128334
Status: resolved
Priority: 0/
Queue: File-BOM

People
Owner: matt.lawrence [...] virgin.net
Requestors: ppisar [...] redhat.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.15
Fixed in: 0.16



Subject: t/01..bom.t test fails with Encode 2.99
Encode-2.99 is better at recognizing invalid characters and t/01..bom.t test fails like this: $ perl -Ilib t/01..bom.t 1..115 ok 1 - use File::BOM; ok 2 - utf-16be.txt: open_bom returned encoding ok 3 - utf-16be.txt: test content returned OK ok 4 - utf-16be.txt: defuse returns correct encoding (UTF-16BE) ok 5 - utf-16be.txt: defused version content OK ok 6 - utf-16be.txt: get_encoding_from_filehandle returned correct encoding ok 7 - utf-16be.txt: get_encoding_from_bom also worked ok 8 - utf-16be.txt: .. and offset worked with substr() UTF-16BE:Partial character at lib/File/BOM.pm line 364, <FH> line 1. # Looks like your test exited with 25 just after 8. It dies in decode_from_bom() because FB_CROAK is requested and the to-be-decoded byte-string indeed ends with a partial character. The partial character is result of chomp() on encoded new-line: $ hexdump -C t/data/utf-16be.txt 00000000 fe ff 00 db 00 f1 00 ed 00 e7 00 f4 01 11 00 e8 |................| 00000010 00 0a |..| 00000012 The test reads content of the above quoted file, performs chomp() so that last byte is removed and the string is left with a dangling \x00 byte. This malformed string stored in $first_line variable is then passes to decode_from_bom() using: my $result = decode_from_bom($first_line, 'UTF-8', FB_CROAK); is($result, $expect, "$file: decode_from_bom() scalar context"); As a result the test script dies. I can see the test script already installs __WARN__ handler to filter similar warnings. I believe a proper fix is to remove the whole "\n" representation in a given encoding.
Dne St 23.led.2019 11:03:10, ppisar napsal(a): Show quoted text
> Encode-2.99 is better at recognizing invalid characters and > t/01..bom.t test fails like this: >
Attached patch fixes it.
Subject: File-BOM-0.15-Adapt-to-stricter-Encode-2.99.patch
From ee024445df54df26d1aff18f65a54df725ba452a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= <ppisar@redhat.com> Date: Fri, 25 Jan 2019 15:08:36 +0100 Subject: [PATCH] Adapt to stricter Encode-2.99 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Encode-2.99 is better at recognizing invalid characters and t/01..bom.t test fails like this: ok 8 - utf-32be.txt: .. and offset worked with substr() UTF-32BE:Partial character at lib/File/BOM.pm line 364, <FH> line 1. # Looks like your test exited with 25 just after 8. There problem is how the test handles end-of-line separator. It reads the utf-32be.txt as a byte-strin up to first "\n" byte (not a UTF-32BE-encoded "\n"). Then it performs a chomp() on the first line and again it removes "\n" byte instead of encoded representation. As a result the first line is left with a garbage at the end a Encode die when it encounter it. This patch fixes both issues by setting $/ to the expected "\n" representation. Signed-off-by: Petr Písař <ppisar@redhat.com> --- t/01..bom.t | 8 ++++++-- t/lib/Test/Framework.pm | 12 +++++++++++- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/t/01..bom.t b/t/01..bom.t index c41b060..6ecf221 100644 --- a/t/01..bom.t +++ b/t/01..bom.t @@ -56,8 +56,12 @@ for my $file (@test_files) { } open FH, '<', $file2path{$file}; - my $first_line = <FH>; - chomp $first_line; + my $first_line; + { + local $/ = $fileeol{$file}; + $first_line = <FH>; + chomp $first_line; + } seek(FH, 0, SEEK_SET); diff --git a/t/lib/Test/Framework.pm b/t/lib/Test/Framework.pm index b4b9f6b..800f9dd 100644 --- a/t/lib/Test/Framework.pm +++ b/t/lib/Test/Framework.pm @@ -4,6 +4,7 @@ package Test::Framework; # Common resources for tests # +use Encode qw( encode :fallback_all ); use File::Spec::Functions qw( catfile ); use File::Temp qw( tmpnam ); use POSIX qw( mkfifo ); @@ -13,7 +14,7 @@ use utf8; use base qw( Exporter ); -our(%file2path, %file2enc, %filecontent, @test_files, $fifo_supported); +our(%file2path, %file2enc, %filecontent, %fileeol, @test_files, $fifo_supported); @EXPORT = qw( make_test_data @@ -21,6 +22,7 @@ our(%file2path, %file2enc, %filecontent, @test_files, $fifo_supported); %file2path %file2enc %filecontent + %fileeol @test_files write_fifo $fifo_supported @@ -45,6 +47,14 @@ our(%file2path, %file2enc, %filecontent, @test_files, $fifo_supported); ); @test_files = keys %file2enc; +for (@test_files) { + my $enc = $file2enc{$_}; + $enc = 'ASCII' if $enc eq ''; + my $eol = "\n"; + $eol = encode($enc, $eol, FB_CROAK); + $fileeol{$_} = $eol; +} + $file2path{$_} = catfile(qw(t data), $_) for @test_files; # write data into files -- 2.17.2
Subject: Re: [rt.cpan.org #128334] t/01..bom.t test fails with Encode 2.99
Date: Wed, 6 Feb 2019 10:00:47 +0000
To: bug-File-BOM [...] rt.cpan.org
From: Matt Lawrence <matt.lawrence [...] virgin.net>
On 25/01/2019 14:19, Petr Pisar via RT wrote: Show quoted text
> Queue: File-BOM > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=128334 > > > Dne St 23.led.2019 11:03:10, ppisar napsal(a):
>> Encode-2.99 is better at recognizing invalid characters and >> t/01..bom.t test fails like this: >>
> Attached patch fixes it. >
Thanks! I'll try to get that rolled out today. Matt --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On 2019-02-06 05:02:17, MATTLAW wrote: Show quoted text
> On 25/01/2019 14:19, Petr Pisar via RT wrote:
> > Queue: File-BOM > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=128334 > > > > > Dne St 23.led.2019 11:03:10, ppisar napsal(a):
> >> Encode-2.99 is better at recognizing invalid characters and > >> t/01..bom.t test fails like this: > >>
> > Attached patch fixes it. > >
> Thanks! > > I'll try to get that rolled out today. >
0.16 looks better --- probably this issue may be resolved? Regards, Slaven