On Fri, 29 Aug 2014 12:39:42 -0400, "Fjodor Fedov via RT"
<bug-Spreadsheet-Read@rt.cpan.org> wrote:
Show quoted text> Fri Aug 29 12:39:41 2014: Request 98436 was acted upon.
> Transaction: Ticket created by fedov
> Queue: Spreadsheet-Read
> Subject: xlscat and unicode
> Broken in: 0.54
> Severity: Normal
> Owner: Nobody
> Requestors: dumb_kane@hotmail.com
> Status: new
> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=98436 >
>
>
> I'm using Spreadsheet::Read 0.54 and Spreadsheet::ParseExcel 0.65
> This concerns the included utilities: xls2csv, xlscat and xlsgrep. I
> can't get them to work correctly with my locale settings (utf8).
> While my extremly naive script (spe.pl) which uses
> Spreadsheet::ParseExcel directly produces readable output, the
> mentioned utilities' output is scrambled, when the input file
> (book.xls) contains non-ascii characters (see output)
Interestingly, the verbose options shows fine till almost the end:
$ xlscat -v9 book.xls
ReadData (book.xls, debug 0 clip 1);
[
{ error => undef,
parser => 'Spreadsheet::ParseExcel',
sheet => {
Blatt1 => 1
},
sheets => 3,
type => 'xls',
version => '0.65'
},
{ A1 => '��',
A2 => '��',
B1 => '��',
B2 => '��',
attr => [],
cell => [
[],
[ undef,
'���',
'����'
],
[ undef,
'����',
'���'
]
],
label => 'Blatt1',
maxcol => 2,
maxrow => 2
}
]
Opened book.xls with 3 sheets
Opening sheet 1 ...
{ A1 => 'üüü',
A2 => 'öÖöÖ',
B1 => 'àéòú',
B2 => 'ßßß',
attr => [],
cell => [
[],
[ undef,
'üüü',
'öÖöÖ'
],
[ undef,
'àéòú',
'ßßß'
]
],
label => 'Blatt1',
maxcol => 2,
maxrow => 2
}
book.xls - 01: [ Blatt1 ] 2 Cols, 2 Rows
{ sheet => {
1 => 1
}
}
1:1 'üüü' / 'üüü'
2:1 'àéòú' / 'àéòú'
��|��
1:2 'öÖöÖ' / 'öÖöÖ'
2:2 'ßßß' / 'ßßß'
��|��
2 x 2
If I use Data::Peek's DPeek () on the values and fields:
1:1 'üüü' / 'üüü'
1:1 PV("\0\374\0\374\0\374"\0) / PV("\303\274\303\274\303\274"\0) [UTF8 "\x{fc}\x{fc}\x{fc}"]
2:1 'àéòú' / 'àéòú'
2:1 PV("\0\340\0\351\0\362\0\372"\0) / PV("\303\240\303\251\303\262\303\272"\0) [UTF8 "\x{e0}\x{e9}\x{f2}\x{fa}"]
��|��
1:2 'öÖöÖ' / 'öÖöÖ'
1:2 PV("\0\366\0\326\0\366\0\326"\0) / PV("\303\266\303\226\303\266\303\226"\0) [UTF8 "\x{f6}\x{d6}\x{f6}\x{d6}"]
2:2 'ßßß' / 'ßßß'
2:2 PV("\0\337\0\337\0\337"\0) / PV("\303\237\303\237\303\237"\0) [UTF8 "\x{df}\x{df}\x{df}"]
��|��
showing that the unformatted values are unencoded and the formatted values are encoded
About line 383 in xlscat there is
if ($enc_o) { $_ = encode ($enc_o, $_) for @row; }
That should probably be commented out, as xlscat also sets binmode
$ xlscat -v2 -a utf-8 book.xls
Opened book.xls with 3 sheets
Opening sheet 1 ...
book.xls - 01: [ Blatt1 ] 2 Cols, 2 Rows
üüü|àéòú
öÖöÖ|ßßß
2 x 2
--
H.Merijn Brand
http://tux.nl Perl Monger
http://amsterdam.pm.org/
using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/