Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 34644
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: HMBRAND [...] cpan.org
Requestors: charles [...] modernterminals.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.34
Fixed in: 0.34



Subject: Missing double quote
Dear Sir We encountered a problem when converting a GB2312 XLS file to UTF-8 CSV file with xls2csv.pl (see attached files) The command is as follows:- xls2csv.pl -x original_GB2312_file.XLS -b GB2312 -c conv_utf8_file.csv -a UTF-8 The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV file. A record was found missing double quote (the second column) as follows:- /* quote */ 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海X物流有限公 司",2225,,"蓝","林XX",26473005,"林XX",26473006 /* unquote */ Your help is appreciated Thank you Regards Charles
Subject: orignal_GB2312_file.XLS
Download orignal_GB2312_file.XLS
application/vnd.ms-excel 31.5k

Message body not shown because it is not plain text.

Subject: xls2csv.pl
#!/usr/bin/perl use strict; use 5.006; use Getopt::Std; use Locale::Recode; use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtUnicode; use Text::CSV_XS; our $VERSION = '1.06'; =head1 NAME xls2csv - A script that recodes a spreadsheet's charset and saves as CSV. =head1 DESCRIPTION This script will recode a spreadsheet into a different character set and output the recoded data as a csv file. The script came about after many headaches from dealing with Excel spreadsheets from clients that were being received in various character sets. =head1 OPTIONS -x : filename of the source spreadsheet -b : the character set the source spreadsheet is in (before) -c : the filename to save the generated csv file as -a : the character set the csv file should be converted to (after) -q : quiet mode -s : print a list of supported character sets -h : print help message -v : get version information -W : list worksheets in the spreadsheet specified by -x -w : specify the worksheet name to convert (defaults to the first worksheet) =head1 EXAMPLE USAGE The following example will convert a spreadsheet that is in the WINDOWS-1252 character set (WinLatin1) and save it as a csv file in the UTF-8 character set. xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8 This example with convert the worksheet named "Users" in the given spreadsheet. xls2csv -x "multi_worksheet_spreadsheet.xls" -w "Users" -c "users.csv" =head1 NOTES The spreadsheet's charset (-b) will default to UTF-8 if not set. If the csv's charset (-a) is not set, the CSV file will be created using the same charset as the spreadsheet. =head1 REQUIRED MODULES This script requires the following modules: Locale::Recode Unicode::Map Spreadsheet::ParseExcel Spreadsheet::ParseExcel::FmtUnicode (should be included with Spreadsheet::ParseExcel) Text::CSV_XS =head1 CAVEATS It probably will not work work with spreadsheets that use formulas. A line in the spreadsheet is assumed to be blank if there is nothing in the first column. Some users have reported problems trying to convert a spreadsheet while it was opened in a different application. You should probably make sure that no other programs are working with the spreadsheet while you are converting it. =cut $Getopt::Std::STANDARD_HELP_VERSION = 1; my %O; getopts('x:b:c:a:qshvWw:', \%O); HELP_MESSAGE() if !%O or $O{'h'}; VERSION_MESSAGE() if $O{'v'}; if ($O{'s'}) { print "\nThe following character sets are supported:\n\n"; my $Supported = Locale::Recode->getSupported; foreach my $CharSet (sort @$Supported) { print "$CharSet\n"; } print "\n"; exit; } my $SourceFilename = $O{'x'} || die "The filename of the spreadsheet (-x) is required."; my $SourceCharset = $O{'b'}; $SourceCharset = 'UTF-8' unless $SourceCharset; unless ($O{'q'}) { print "Now reading \"$SourceFilename\" as $SourceCharset.\n"; } my $XLS = new IO::File; $XLS->open("< $SourceFilename") || die "Cannot open spreadsheet: $!"; my $Formatter = Spreadsheet::ParseExcel::FmtUnicode->new(Unicode_Map => $SourceCharset); my $Book = Spreadsheet::ParseExcel::Workbook->Parse($XLS, $Formatter) || die "Can't read spreadsheet!"; if ($O{'W'}) { print "\nThe following " . ($Book->{SheetCount}>1 ? "$Book->{SheetCount} worksheets are" : "worksheet is") . " defined in the spreadsheet:\n\n"; foreach my $Sheet (@{$Book->{Worksheet}}) { print "$Sheet->{Name}\n"; } print "\n"; exit; } my $DestFilename = $O{'c'} || die "The filename to save the csv file as (-c) is required."; my $DestCharset = $O{'a'}; $DestCharset = $SourceCharset unless $DestCharset; my $Sheet; if ($O{'w'}) { $Sheet = $Book->Worksheet($O{'w'}); die "Invalid worksheet" if !defined $Sheet; unless ($O{'q'}) { print qq|Converting the "$Sheet->{Name}" worksheet.\n|; } } else { ($Sheet) = @{$Book->{Worksheet}}; if (!$O{'q'} && $Book->{SheetCount}>1) { print qq|Multiple worksheets found. Will convert the "$Sheet->{Name}" worksheet.\n|; } } open CSV, "> $DestFilename" || die "Cannot create csv file: $!" ; binmode CSV; my $Csv = Text::CSV_XS->new({ 'quote_char' => '"', 'escape_char' => '"', 'sep_char' => ',', 'binary' => 1, }); my $Recoder; if ($O{'a'}) { $Recoder = Locale::Recode->new(from=>$SourceCharset, to=>$DestCharset); } for ( my $Row = $Sheet->{MinRow} ; defined $Sheet->{MaxRow} && $Row <= $Sheet->{MaxRow} ; $Row++ ) { my @Row; for ( my $Col = $Sheet->{MinCol} ; defined $Sheet->{MaxCol} && $Col <= $Sheet->{MaxCol} ; $Col++ ) { my $Cell = $Sheet->{Cells}[$Row][$Col]; my $Value = ""; if ($Cell) { $Value = $Cell->Value; if ($Value eq 'GENERAL') { # Sometimes numbers are read incorrectly as "GENERAL". # In this case, the correct value should be in ->{Val}. $Value = $Cell->{Val}; } if ($O{'a'}) { $Recoder->recode($Value); } } # We assume the line is blank if there is nothing in the first column. last if $Col == $Sheet->{MinCol} and !$Value; push(@Row, $Value); } next unless @Row; my $Status = $Csv->combine(@Row); if (!$O{'q'} and !defined $Status) { my $Error = $Csv->error_input(); warn "ERROR FOUND!: $Error"; } if (defined $Status) { my $Line = $Csv->string(); print CSV "$Line\n"; } } close CSV; $XLS->close; unless ($O{'q'}) { print "The spreadsheet has been converted to $DestCharset and saved as \"$DestFilename\".\n"; } sub VERSION_MESSAGE { print << "EOF"; This is xls2csv version $VERSION Copyright (C) 2005 Ken Prows. All rights reserved. This script is free software; you can redistribute it and\\or modify it under the same terms as Perl itself. For help, use "xls2csv -h" EOF exit; } sub HELP_MESSAGE { print << "EOF"; xls2csv - Recode a spreadsheet's charset and save as CSV. usage: xls2csv -x spreadsheet.xls [-w worksheet] [-b charset] [-c csvfile.csv] [-a charset] [-qshvW] -x : filename of the source spreadsheet -b : the character set the source spreadsheet is in (before) -c : the filename to save the generated csv file as -a : the character set the csv file should be converted to (after) -q : quiet mode -s : print a list of supported character sets -h : this help message -v : get version information -W : list worksheets in the spreadsheet specified by -x -w : specify the worksheet name to convert (defaults to the first worksheet) example: xls2csv -x "spreadsheet.xls" -b WINDOWS-1252 -c "csvfile.csv" -a UTF-8 More detailed help is in "perldoc xls2csv" EOF exit; } =head1 AUTHOR Ken Prows (perl@xev.net) =head1 COPYRIGHT Copyright (C) 2005 Ken Prows. All rights reserved. This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut
Subject: conv_utf8_file.csv
Download conv_utf8_file.csv
application/vnd.ms-excel 12k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #34644] Missing double quote
Date: Thu, 3 Apr 2008 13:55:06 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 03 Apr 2008 04:22:01 -0400, "Charles C. via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> We encountered a problem when converting a GB2312 XLS file to UTF-8 > CSV file with xls2csv.pl (see attached files) > > The command is as follows:- > > xls2csv.pl -x original_GB2312_file.XLS -b GB2312 -c conv_utf8_file.csv -a UTF-8 > > The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV file.
Consider upgrading to 0.37, it has a memory leak fixed Show quoted text
> A record was found missing double quote (the second column) as follows:-
Technically, this line doesn't need any quotation, as there is no possible conflicting situation when using binary => 1 Quotes are only forced when the field contains *bytes* ... * 0x00 .. 0x1f * 0x7f .. 0xa0 * equal to quote_char * equal to sep_char * equal to escape_char not diving into the encoded data, I suspect that all the other quoted fields have a *byte* in the range of 0x7f .. 0xa0. Text::CSV_XS has no knowledge about encoding. Show quoted text
> /* quote */ > > 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海X物流有限公 > 司",2225,,"蓝","林XX",26473005,"林XX",26473006 > > /* unquote */
$ xlscat -v5 -R74 -c orignal_GB2312_file.XLS ReadData (orignal_GB2312_file.XLS, debug 0 clip 1); Opened orignal_GB2312_file.XLS with 1 sheets Opening sheet 1 ... orignal_GB2312_file.XLS - 01: [ Sheet0 ] 13 Cols, 87 Rows 1:74 '140645' / '140645' 2:74 '~¢|FB|R12036|' / '红|FB|R12036|' 3:74 'ADD' / 'ADD' 4:74 '20080402' / '20080402' 5:74 '933104786' / '933104786' 6:74 'mñW3^mwl§rimAg PQlSø' / '深圳市海沧物流有限公司' 7:74 '2225' / '2225' 8:74 '' / '' 9:74 'Ý' / '蓝' 10:74 'g_×N' / '林志东' 11:74 '26473005' / '26473005' 12:74 'g_×N' / '林志东' 13:74 '26473006' / '26473006' 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海沧物流有限公司",2225,,"蓝","林志东",26473005,"林志东",26473006 $ xlscat -R74 -c orignal_GB2312_file.XLS > test.csv $ xlscat -s"\n" test.csv 140645 红|FB|R12036| ADD 20080402 933104786 深圳市海沧物流有限公司 2225 蓝 林志东 26473005 林志东 26473006 13 x 1 $ I've added the encoding options to xlscat (included in the Text::CSV_XS module) like this: Show quoted text
> xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns -e <enc> Set encoding for input and output -b <enc> Set encoding for input -a <enc> Set encoding for output Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16 Show quoted text
> Your help is appreciated > Thank you > Regards > Charles
-- H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/
Show quoted text
> The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV
file. Consider upgrading to 0.37, it has a memory leak fixed Technically, this line doesn't need any quotation, as there is no possible conflicting situation when using binary => 1 Quotes are only forced when the field contains *bytes* ... * 0x00 .. 0x1f * 0x7f .. 0xa0 * equal to quote_char * equal to sep_char * equal to escape_char not diving into the encoded data, I suspect that all the other quoted fields have a *byte* in the range of 0x7f .. 0xa0. Text::CSV_XS has no knowledge about encoding. $ xlscat -v5 -R74 -c orignal_GB2312_file.XLS ReadData (orignal_GB2312_file.XLS, debug 0 clip 1); Opened orignal_GB2312_file.XLS with 1 sheets Opening sheet 1 ... orignal_GB2312_file.XLS - 01: [ Sheet0 ] 13 Cols, 87 Rows 1:74 '140645' / '140645' 2:74 '~¢|FB|R12036|' / '红|FB|R12036|' 3:74 'ADD' / 'ADD' 4:74 '20080402' / '20080402' 5:74 '933104786' / '933104786' 6:74 'mñW3^mwl§rimAg PQlSø' / '深圳市海沧物流有限公司' 7:74 '2225' / '2225' 8:74 '' / '' 9:74 'Ý' / '蓝' 10:74 'g_×N' / '林志东' 11:74 '26473005' / '26473005' 12:74 'g_×N' / '林志东' 13:74 '26473006' / '26473006' 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海沧物流有限公 司",2225,,"蓝","林志东",26473005,"林志东",26473006 $ xlscat -R74 -c orignal_GB2312_file.XLS > test.csv $ xlscat -s"\n" test.csv 140645 红|FB|R12036| ADD 20080402 933104786 深圳市海沧物流有限公司 2225 蓝 林志东 26473005 林志东 26473006 13 x 1 $ I've added the encoding options to xlscat (included in the Spreadsheet::Read module) like this: Show quoted text
> xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns -e <enc> Set encoding for input and output -b <enc> Set encoding for input -a <enc> Set encoding for output Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16
Show quoted text
> The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV
file. Consider upgrading to 0.37, it has a memory leak fixed Technically, this line doesn't need any quotation, as there is no possible conflicting situation when using binary => 1 Quotes are only forced when the field contains *bytes* ... * 0x00 .. 0x1f * 0x7f .. 0xa0 * equal to quote_char * equal to sep_char * equal to escape_char not diving into the encoded data, I suspect that all the other quoted fields have a *byte* in the range of 0x7f .. 0xa0. Text::CSV_XS has no knowledge about encoding. $ xlscat -v5 -R74 -c orignal_GB2312_file.XLS ReadData (orignal_GB2312_file.XLS, debug 0 clip 1); Opened orignal_GB2312_file.XLS with 1 sheets Opening sheet 1 ... orignal_GB2312_file.XLS - 01: [ Sheet0 ] 13 Cols, 87 Rows 1:74 '140645' / '140645' 2:74 '~¢|FB|R12036|' / '红|FB|R12036|' 3:74 'ADD' / 'ADD' 4:74 '20080402' / '20080402' 5:74 '933104786' / '933104786' 6:74 'mñW3^mwl§rimAg PQlSø' / '深圳市海沧物流有限公司' 7:74 '2225' / '2225' 8:74 '' / '' 9:74 'Ý' / '蓝' 10:74 'g_×N' / '林志东' 11:74 '26473005' / '26473005' 12:74 'g_×N' / '林志东' 13:74 '26473006' / '26473006' 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海沧物流有限公 司",2225,,"蓝","林志东",26473005,"林志东",26473006 $ xlscat -R74 -c orignal_GB2312_file.XLS > test.csv $ xlscat -s"\n" test.csv 140645 红|FB|R12036| ADD 20080402 933104786 深圳市海沧物流有限公司 2225 蓝 林志东 26473005 林志东 26473006 13 x 1 $ I've added the encoding options to xlscat (included in the Spreadsheet::Read module) like this: Show quoted text
> xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns -e <enc> Set encoding for input and output -b <enc> Set encoding for input -a <enc> Set encoding for output Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16
Show quoted text
> The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV
file. Consider upgrading to 0.37, it has a memory leak fixed Technically, this line doesn't need any quotation, as there is no possible conflicting situation when using binary => 1 Quotes are only forced when the field contains *bytes* ... * 0x00 .. 0x1f * 0x7f .. 0xa0 * equal to quote_char * equal to sep_char * equal to escape_char not diving into the encoded data, I suspect that all the other quoted fields have a *byte* in the range of 0x7f .. 0xa0. Text::CSV_XS has no knowledge about encoding. $ xlscat -v5 -R74 -c orignal_GB2312_file.XLS ReadData (orignal_GB2312_file.XLS, debug 0 clip 1); Opened orignal_GB2312_file.XLS with 1 sheets Opening sheet 1 ... orignal_GB2312_file.XLS - 01: [ Sheet0 ] 13 Cols, 87 Rows 1:74 '140645' / '140645' 2:74 '~¢|FB|R12036|' / '红|FB|R12036|' 3:74 'ADD' / 'ADD' 4:74 '20080402' / '20080402' 5:74 '933104786' / '933104786' 6:74 'mñW3^mwl§rimAg PQlSø' / '深圳市海沧物流有限公司' 7:74 '2225' / '2225' 8:74 '' / '' 9:74 'Ý' / '蓝' 10:74 'g_×N' / '林志东' 11:74 '26473005' / '26473005' 12:74 'g_×N' / '林志东' 13:74 '26473006' / '26473006' 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海沧物流有限公 司",2225,,"蓝","林志东",26473005,"林志东",26473006 $ xlscat -R74 -c orignal_GB2312_file.XLS > test.csv $ xlscat -s"\n" test.csv 140645 红|FB|R12036| ADD 20080402 933104786 深圳市海沧物流有限公司 2225 蓝 林志东 26473005 林志东 26473006 13 x 1 $ I've added the encoding options to xlscat (included in the Spreadsheet::Read module) like this: Show quoted text
> xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns -e <enc> Set encoding for input and output -b <enc> Set encoding for input -a <enc> Set encoding for output Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16
Subject: RE: [rt.cpan.org #34644] Missing double quote
Date: Fri, 4 Apr 2008 12:36:38 +0800
To: <bug-Text-CSV_XS [...] rt.cpan.org>
From: "Chu, Charles" <charles [...] ModernTerminals.com>
Dear Mr. H. M. Brand Thank you for your advices~! Actually, I am pretty fresh in PERL development and overlook some information in CPAN. I thought the always_quote option would help to solve our problem. Besides, I would take your advice to upgrade 0.34 to 0.37. Thank you Regards Charles Show quoted text
-----Original Message----- From: h.m.brand@xs4all.nl via RT [mailto:bug-Text-CSV_XS@rt.cpan.org] Sent: Thursday, April 03, 2008 20:06 To: charles@modernterminals.com Subject: Re: [rt.cpan.org #34644] Missing double quote <URL: http://rt.cpan.org/Ticket/Display.html?id=34644 > On Thu, 03 Apr 2008 04:22:01 -0400, "Charles C. via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:
> We encountered a problem when converting a GB2312 XLS file to UTF-8 > CSV file with xls2csv.pl (see attached files) > > The command is as follows:- > > xls2csv.pl -x original_GB2312_file.XLS -b GB2312 -c conv_utf8_file.csv -a UTF-8 > > The xls2csv.pl uses TEXT-CSV_XLS (0.34) modules to write the CSV file.
Consider upgrading to 0.37, it has a memory leak fixed
> A record was found missing double quote (the second column) as follows:-
Technically, this line doesn't need any quotation, as there is no possible conflicting situation when using binary => 1 Quotes are only forced when the field contains *bytes* ... * 0x00 .. 0x1f * 0x7f .. 0xa0 * equal to quote_char * equal to sep_char * equal to escape_char not diving into the encoded data, I suspect that all the other quoted fields have a *byte* in the range of 0x7f .. 0xa0. Text::CSV_XS has no knowledge about encoding.
> /* quote */ > > 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海X物流有限公 > 司",2225,,"蓝","林XX",26473005,"林XX",26473006 > > /* unquote */
$ xlscat -v5 -R74 -c orignal_GB2312_file.XLS ReadData (orignal_GB2312_file.XLS, debug 0 clip 1); Opened orignal_GB2312_file.XLS with 1 sheets Opening sheet 1 ... orignal_GB2312_file.XLS - 01: [ Sheet0 ] 13 Cols, 87 Rows 1:74 '140645' / '140645' 2:74 '~¢|FB|R12036|' / '红|FB|R12036|' 3:74 'ADD' / 'ADD' 4:74 '20080402' / '20080402' 5:74 '933104786' / '933104786' 6:74 'mñW3^mwl§rimAg PQlSø' / '深圳市海沧物流有限公司' 7:74 '2225' / '2225' 8:74 '' / '' 9:74 'Ý' / '蓝' 10:74 'g_×N' / '林志东' 11:74 '26473005' / '26473005' 12:74 'g_×N' / '林志东' 13:74 '26473006' / '26473006' 140645,红|FB|R12036|,ADD,20080402,933104786,"深圳市海沧物流有限公司",2225,,"蓝","林志东",26473005,"林志东",26473006 $ xlscat -R74 -c orignal_GB2312_file.XLS > test.csv $ xlscat -s"\n" test.csv 140645 红|FB|R12036| ADD 20080402 933104786 深圳市海沧物流有限公司 2225 蓝 林志东 26473005 林志东 26473006 13 x 1 $ I've added the encoding options to xlscat (included in the Text::CSV_XS module) like this:
> xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns -e <enc> Set encoding for input and output -b <enc> Set encoding for input -a <enc> Set encoding for output Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16
> Your help is appreciated > Thank you > Regards > Charles
-- H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/