Skip Menu |

This queue is for tickets about the Spreadsheet-Read CPAN distribution.

Report information
The Basics
Id: 98436
Status: resolved
Priority: 0/
Queue: Spreadsheet-Read

People
Owner: Nobody in particular
Requestors: dumb_kane [...] hotmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.54
Fixed in: 0.56



Subject: xlscat and unicode
I'm using Spreadsheet::Read 0.54 and Spreadsheet::ParseExcel 0.65 This concerns the included utilities: xls2csv, xlscat and xlsgrep. I can't get them to work correctly with my locale settings (utf8). While my extremly naive script (spe.pl) which uses Spreadsheet::ParseExcel directly produces readable output, the mentioned utilities' output is scrambled, when the input file (book.xls) contains non-ascii characters (see output)
Subject: book.xls
Download book.xls
application/vnd.ms-excel 5.5k

Message body not shown because it is not plain text.

Subject: output
Download output
application/octet-stream 453b

Message body not shown because it is not plain text.

Subject: spe.pl
#!/usr/bin/env perl use strict; use warnings; use open qw/:std :locale/; use Spreadsheet::ParseExcel; my $b = shift; my $w = Spreadsheet::ParseExcel->new()->parse( $b )->worksheet(0); for my $row ( 0 .. ($w->row_range)[1] ) { for my $col ( 0 .. ($w->col_range)[1] ) { my $cell = $w->get_cell($row, $col); print $cell->value . ' ' if $cell; } print "\n"; }
Subject: Re: [rt.cpan.org #98436] xlscat and unicode
Date: Fri, 29 Aug 2014 19:25:22 +0200
To: bug-Spreadsheet-Read [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Fri, 29 Aug 2014 12:39:42 -0400, "Fjodor Fedov via RT" <bug-Spreadsheet-Read@rt.cpan.org> wrote: Show quoted text
> Fri Aug 29 12:39:41 2014: Request 98436 was acted upon. > Transaction: Ticket created by fedov > Queue: Spreadsheet-Read > Subject: xlscat and unicode > Broken in: 0.54 > Severity: Normal > Owner: Nobody > Requestors: dumb_kane@hotmail.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=98436 > > > > I'm using Spreadsheet::Read 0.54 and Spreadsheet::ParseExcel 0.65 > This concerns the included utilities: xls2csv, xlscat and xlsgrep. I > can't get them to work correctly with my locale settings (utf8). > While my extremly naive script (spe.pl) which uses > Spreadsheet::ParseExcel directly produces readable output, the > mentioned utilities' output is scrambled, when the input file > (book.xls) contains non-ascii characters (see output)
Interestingly, the verbose options shows fine till almost the end: $ xlscat -v9 book.xls ReadData (book.xls, debug 0 clip 1); [ { error => undef, parser => 'Spreadsheet::ParseExcel', sheet => { Blatt1 => 1 }, sheets => 3, type => 'xls', version => '0.65' }, { A1 => '��', A2 => '��', B1 => '��', B2 => '��', attr => [], cell => [ [], [ undef, '���', '����' ], [ undef, '����', '���' ] ], label => 'Blatt1', maxcol => 2, maxrow => 2 } ] Opened book.xls with 3 sheets Opening sheet 1 ... { A1 => 'üüü', A2 => 'öÖöÖ', B1 => 'àéòú', B2 => 'ßßß', attr => [], cell => [ [], [ undef, 'üüü', 'öÖöÖ' ], [ undef, 'àéòú', 'ßßß' ] ], label => 'Blatt1', maxcol => 2, maxrow => 2 } book.xls - 01: [ Blatt1 ] 2 Cols, 2 Rows { sheet => { 1 => 1 } } 1:1 'üüü' / 'üüü' 2:1 'àéòú' / 'àéòú' ��|�� 1:2 'öÖöÖ' / 'öÖöÖ' 2:2 'ßßß' / 'ßßß' ��|�� 2 x 2 If I use Data::Peek's DPeek () on the values and fields: 1:1 'üüü' / 'üüü' 1:1 PV("\0\374\0\374\0\374"\0) / PV("\303\274\303\274\303\274"\0) [UTF8 "\x{fc}\x{fc}\x{fc}"] 2:1 'àéòú' / 'àéòú' 2:1 PV("\0\340\0\351\0\362\0\372"\0) / PV("\303\240\303\251\303\262\303\272"\0) [UTF8 "\x{e0}\x{e9}\x{f2}\x{fa}"] ��|�� 1:2 'öÖöÖ' / 'öÖöÖ' 1:2 PV("\0\366\0\326\0\366\0\326"\0) / PV("\303\266\303\226\303\266\303\226"\0) [UTF8 "\x{f6}\x{d6}\x{f6}\x{d6}"] 2:2 'ßßß' / 'ßßß' 2:2 PV("\0\337\0\337\0\337"\0) / PV("\303\237\303\237\303\237"\0) [UTF8 "\x{df}\x{df}\x{df}"] ��|�� showing that the unformatted values are unencoded and the formatted values are encoded About line 383 in xlscat there is if ($enc_o) { $_ = encode ($enc_o, $_) for @row; } That should probably be commented out, as xlscat also sets binmode $ xlscat -v2 -a utf-8 book.xls Opened book.xls with 3 sheets Opening sheet 1 ... book.xls - 01: [ Blatt1 ] 2 Cols, 2 Rows üüü|àéòú öÖöÖ|ßßß 2 x 2 -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 490b

Message body not shown because it is not plain text.