Subject: | Format handling cannot deal with Unicode formats [PATCH included] |
Fixing patch at the end
pc09:/pro/3gl/CPAN/Spreadsheet-Read 140 > make test
PERL_DL_NONLAZY=1 /pro/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/csv....ok
t/sc.....ok
t/sxc....ok
t/xls....ok 57/217Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
Character in 'C' format wrapped in pack at /pro/lib/perl5/site_perl/5.8.6/Spreadsheet/ParseExcel/FmtDefault.pm line 68.
t/xls....ok
All tests successful.
Files=4, Tests=645, 3 wallclock secs ( 2.63 cusr + 0.21 csys = 2.84 CPU)
pc09:/pro/3gl/CPAN/Spreadsheet-Read 141 > prove -b -v t/xls.t
t/xls....1..217
ok 1 - Nonexistent file
ok 2 - Empty file
:
:
ok 196 - Cell B6
ok 197 - Cell 2, 6
ok 198 - Cell B20
ok 199 - Cell 2, 20
ok 200 - Cell C26
ok 201 - Cell 3, 26
ok 202 - Cell D14
ok 203 - Cell 4, 14
ok 204 - True/False values
ok 205 - first sheet
ok 206 - unformatted plain text
ok 207 - unformatted space
ok 208 - unformatted empty
ok 209 - unformatted numeric 0
ok 210 - unformatted numeric 1
ok 211 - unformatted a single '
ok 212 - formatted plain text
ok 213 - formatted space
ok 214 - formatted empty
ok 215 - formatted numeric 0
ok 216 - formatted numeric 1
ok 217 - formatted a single '
ok
All tests successful.
Files=1, Tests=217, 4 wallclock secs ( 1.09 cusr + 0.05 csys = 1.14 CPU)
pc09:/pro/3gl/CPAN/Spreadsheet-Read 142 > echo $PERL5LIB
/pro/3gl/CPAN/Spreadsheet-Read/blib:/pro/3gl/CPAN/Spreadsheet-Read/blib/lib:/pro/3gl/CPAN/Spreadsheet-Read/blib/arch
pc09:/pro/3gl/CPAN/Spreadsheet-Read 142 > perl t/xls.t
1..217
ok 1 - Nonexistent file
ok 2 - Empty file
ok 3 - Read/Parse xls file
ok 4 - Base values
ok 5 - Return type
:
:
ok 203 - Cell 4, 14
ok 204 - True/False values
ok 205 - first sheet
ok 206 - unformatted plain text
ok 207 - unformatted space
ok 208 - unformatted empty
ok 209 - unformatted numeric 0
ok 210 - unformatted numeric 1
ok 211 - unformatted a single '
ok 212 - formatted plain text
ok 213 - formatted space
ok 214 - formatted empty
ok 215 - formatted numeric 0
ok 216 - formatted numeric 1
ok 217 - formatted a single '
pc09:/pro/3gl/CPAN/Spreadsheet-Read 143 >
But for even more fun:
pc09:/pro/3gl/CPAN/Spreadsheet-Read 143 > setenv PERLIO :utf8
pc09:/pro/3gl/CPAN/Spreadsheet-Read 144 > perl t/xls.t
1..217
ok 1 - Nonexistent file
ok 2 - Empty file
utf8 "\xD0" does not map to Unicode at t/xls.t line 26, <$xls> chunk 1.
ok 3 - Read/Parse xls file
ok 4 - Base values
ok 5 - Return type
:
:
ok 44 - Formatted cell D4
ok 45 - Row'ed rows
ok 46 - Row'ed columns
ok 47 - Row'ed value D1
ok 48 - Row'ed value C4
Malformed UTF-8 character (unexpected non-continuation byte 0xcf, immediately after start byte 0xd0) in pattern match (m//) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 170.
Malformed UTF-8 character (unexpected non-continuation byte 0xcf, immediately after start byte 0xd0) in pattern match (m//) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 170.
Malformed UTF-8 character (unexpected non-continuation byte 0xcf, immediately after start byte 0xd0) in pattern match (m//) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 170.
Malformed UTF-8 character (unexpected non-continuation byte 0xcf, immediately after start byte 0xd0) in pattern match (m//) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 221.
Malformed UTF-8 character (unexpected non-continuation byte 0xcf, immediately after start byte 0xd0) in pattern match (m//) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 263.
not ok 49 - Parse xls data
# Failed test (t/xls.t at line 34)
ok 50 - Base values
not ok 51 - Return type
# Failed test (t/xls.t at line 37)
# got: ''
# expected: 'ARRAY'
not ok 52 - Spreadsheet type
# Failed test (t/xls.t at line 38)
# got: undef
# expected: 'xls'
not ok 53 - Sheet count
# Failed test (t/xls.t at line 39)
# got: undef
# expected: '2'
not ok 54 - Sheet list
# Failed test (t/xls.t at line 40)
# got: ''
# expected: 'HASH'
not ok 55 - Sheet list count
:
:
not ok 78 - Unformatted cell D3
# Failed test (t/xls.t at line 48)
# got: undef
# expected: 'D3'
not ok 79 - Formatted cell D3
# Failed test (t/xls.t at line 49)
# got: undef
# expected: 'D3'
ok 80 - Undefined fields
ok 81 - Unformatted cell B3
ok 82 - Formatted cell B3
ok 83 - Unformatted cell C1
ok 84 - Formatted cell C1
ok 85 - Unformatted cell C2
ok 86 - Formatted cell C2
ok 87 - Unformatted cell D2
ok 88 - Formatted cell D2
ok 89 - Unformatted cell D4
ok 90 - Formatted cell D4
Use of uninitialized value in range (or flop) at /pro/3gl/CPAN/Spreadsheet-Read/blib/lib/Spreadsheet/Read.pm line 97.
not ok 91 - Row'ed rows
# Failed test (t/xls.t at line 59)
# got: '0'
# expected: '4'
Can't use an undefined value as an ARRAY reference at t/xls.t line 60.
# Looks like you planned 217 tests but only ran 91.
# Looks like your test died just after 91.
Exit 255
pc09:/pro/3gl/CPAN/Spreadsheet-Read 145 >
Line 68 in FmtDefault:
#------------------------------------------------------------------------------
# TextFmt (for Spreadsheet::ParseExcel::FmtDefault)
#------------------------------------------------------------------------------
sub TextFmt($$;$) {
my($oThis, $sTxt, $sCode) =@_;
return $sTxt if((! defined($sCode)) || ($sCode eq '_native_'));
return pack('C*', unpack('n*', $sTxt)); # <----- line 68
}
After the fix:
pc09:/pro/3gl/CPAN/Spreadsheet-Read 150 > make test
PERL_DL_NONLAZY=1 /pro/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/csv....ok
t/sc.....ok
t/sxc....ok
t/xls....ok
All tests successful.
Files=4, Tests=645, 3 wallclock secs ( 2.62 cusr + 0.18 csys = 2.80 CPU)
pc09:/pro/3gl/CPAN/Spreadsheet-Read 151 >
Fixing it in the module, before make/make test:
pc09:/pro/3gl/CPAN/Spreadsheet-ParseExcel-0.2603 115 > make
cp ParseExcel/FmtJapan.pm blib/lib/Spreadsheet/ParseExcel/FmtJapan.pm
cp ParseExcel/Utility.pm blib/lib/Spreadsheet/ParseExcel/Utility.pm
cp ParseExcel/FmtJapan2.pm blib/lib/Spreadsheet/ParseExcel/FmtJapan2.pm
cp ParseExcel/SaveParser.pm blib/lib/Spreadsheet/ParseExcel/SaveParser.pm
cp ParseExcel/FmtDefault.pm blib/lib/Spreadsheet/ParseExcel/FmtDefault.pm
cp ParseExcel.pm blib/lib/Spreadsheet/ParseExcel.pm
cp ParseExcel/FmtUnicode.pm blib/lib/Spreadsheet/ParseExcel/FmtUnicode.pm
cp ParseExcel/Dump.pm blib/lib/Spreadsheet/ParseExcel/Dump.pm
Manifying blib/man3/Spreadsheet::ParseExcel::Utility.3
Manifying blib/man3/Spreadsheet::ParseExcel::SaveParser.3
Manifying blib/man3/Spreadsheet::ParseExcel.3
pc09:/pro/3gl/CPAN/Spreadsheet-ParseExcel-0.2603 116 > make test
PERL_DL_NONLAZY=1 /pro/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl
1..1
ok 1
pc09:/pro/3gl/CPAN/Spreadsheet-ParseExcel-0.2603 117 >
The patch:
diff -pu ParseExcel/FmtDefault.pm{.org,}
--- ParseExcel/FmtDefault.pm.org 2005-09-15 14:16:36.163623616 +0200
+++ ParseExcel/FmtDefault.pm 2005-09-15 14:11:56.289171000 +0200
@@ -65,7 +65,7 @@ sub new($;%) {
sub TextFmt($$;$) {
my($oThis, $sTxt, $sCode) =@_;
return $sTxt if((! defined($sCode)) || ($sCode eq '_native_'));
- return pack('C*', unpack('n*', $sTxt));
+ return pack('U*', unpack('n*', $sTxt));
}
#------------------------------------------------------------------------------
# FmtStringDef (for Spreadsheet::ParseExcel::FmtDefault)