Subject: | t/01..bom.t test fails with Encode 2.99 |
Encode-2.99 is better at recognizing invalid characters and t/01..bom.t test fails like this:
$ perl -Ilib t/01..bom.t
1..115
ok 1 - use File::BOM;
ok 2 - utf-16be.txt: open_bom returned encoding
ok 3 - utf-16be.txt: test content returned OK
ok 4 - utf-16be.txt: defuse returns correct encoding (UTF-16BE)
ok 5 - utf-16be.txt: defused version content OK
ok 6 - utf-16be.txt: get_encoding_from_filehandle returned correct encoding
ok 7 - utf-16be.txt: get_encoding_from_bom also worked
ok 8 - utf-16be.txt: .. and offset worked with substr()
UTF-16BE:Partial character at lib/File/BOM.pm line 364, <FH> line 1.
# Looks like your test exited with 25 just after 8.
It dies in decode_from_bom() because FB_CROAK is requested and the to-be-decoded byte-string indeed ends with a partial character. The partial character is result of chomp() on encoded new-line:
$ hexdump -C t/data/utf-16be.txt
00000000 fe ff 00 db 00 f1 00 ed 00 e7 00 f4 01 11 00 e8 |................|
00000010 00 0a |..|
00000012
The test reads content of the above quoted file, performs chomp() so that last byte is removed and the string is left with a dangling \x00 byte. This malformed string stored in $first_line variable is then passes to decode_from_bom() using:
my $result = decode_from_bom($first_line, 'UTF-8', FB_CROAK);
is($result, $expect, "$file: decode_from_bom() scalar context");
As a result the test script dies.
I can see the test script already installs __WARN__ handler to filter similar warnings. I believe a proper fix is to remove the whole "\n" representation in a given encoding.