Skip Menu |

This queue is for tickets about the Spreadsheet-Read CPAN distribution.

Report information
The Basics
Id: 102794
Status: rejected
Priority: 0/
Queue: Spreadsheet-Read

People
Owner: Nobody in particular
Requestors: projs+perl [...] niss.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.58
Fixed in: (no value)



Subject: Cannot process Google Drive .xlsx files
A few months ago, automated downloads of Google spreadsheets no longer included the ODS format. While XLSX format is available, the structure of that file format from Google cause problems with Spreadsheet::Read; or at least problems with the underlying parsers. While the module has not yet been released, a fix to work with the new structure was made to Spreadsheet::ParseXLSX a couple of days ago. (Check https://github.com/doy/spreadsheet-parsexlsx/issues/29). Without the fix, the error message with the attached sample document was Can't call method "first_child" on an undefined value at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm line 455. 453 styles => [ 454 map { 455 $border{$border->first_child($_)->att('style') || 'none'} 456 } qw(left right top bottom) 457 ], I did not delve too deeply into the issue, but it seems like the order of expected XML elements changed. When I installed that unreleased ParseXLSX.pm directly, the problem shifted to Spreadsheet::XLSX. Spreadsheet::XLSX also has a problem with google's interpretation of the XLSX structure. Its warning message, which is symptomatic of a deeper problem, is Use of uninitialized value in concatenation (.) or string at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/XLSX.pm line 150. 148 foreach my $sheet (@Worksheet) { 149 150 my $member_sheet = $self -> {zip} -> memberNamed ("xl/$sheet->{path}") or next; 151 The problem is $sheet->{path} is undefined instead matching the corresponding XML file in the xl/worksheets/ directory of the unzipped spreadsheet. Unfortunately, Spreadsheet::XLSX has not been updated in five years and has quite a backlog of issues. (https://rt.cpan.org/Public/Dist/Display.html?Name=Spreadsheet-XLSX) Rather than wait for a fix in that module, I tried, and failed, to modify the source so Spreadsheet::Read skip it and use ParseXLSX instead. Is there some way to deprecate the use of Spreadsheet::XLSX in favor of a current, and functional, alternative? Environment: Spreadsheet::XLSX version 0.13 Spreadsheet::ParseXLSX version 0.16 Spreadsheet::ParseExcel version 0.65 Spreadsheet::Read version 0.58 perl v5.20.1 for darwin (aka OS X)
Subject: bug_demo.xlsx
Download bug_demo.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 3.5k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #102794] Cannot process Google Drive .xlsx files
Date: Mon, 16 Mar 2015 09:46:59 +0100
To: bug-Spreadsheet-Read [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sun, 15 Mar 2015 21:25:03 -0400, "Scott Bolte via RT" <bug-Spreadsheet-Read@rt.cpan.org> wrote: Show quoted text
> A few months ago, automated downloads of Google spreadsheets no longer > included the ODS format. While XLSX format is available, the structure > of that file format from Google cause problems with Spreadsheet::Read; > or at least problems with the underlying parsers. > > While the module has not yet been released, a fix to work with the new > structure was made to Spreadsheet::ParseXLSX a couple of days ago. > (Check https://github.com/doy/spreadsheet-parsexlsx/issues/29). > Without the fix, the error message with the attached sample document > was > > Can't call method "first_child" on an undefined value at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm line 455. > > 453 styles => [ > 454 map { > 455 $border{$border->first_child ($_)->att ('style') || 'none'} > 456 } qw(left right top bottom) > 457 ], > > I did not delve too deeply into the issue, but it seems like the order > of expected XML elements changed. When I installed that unreleased > ParseXLSX.pm directly, the problem shifted to Spreadsheet::XLSX. > > Spreadsheet::XLSX also has a problem with google's interpretation of the > XLSX structure. Its warning message, which is symptomatic of a deeper > problem, is
Spreadsheet::XLSX is something that ought to be deprecated now that we have Spreadsheet::ParseXLSX. It is buggy and not maintained. Whatever problem is found in Spreadsheet::XLSX, be sure that it will not be fixed Don't waste your time in it Show quoted text
> Use of uninitialized value in concatenation (.) or string at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/XLSX.pm line 150. > > 148 foreach my $sheet (@Worksheet) { > 149 > 150 my $member_sheet = $self->{zip}->memberNamed ("xl/$sheet->{path}") or next; > 151 > > The problem is $sheet->{path} is undefined instead matching the > corresponding XML file in the xl/worksheets/ directory of the unzipped > spreadsheet. > > Unfortunately, Spreadsheet::XLSX has not been updated in five years and > has quite a backlog of issues. > (https://rt.cpan.org/Public/Dist/Display.html?Name=Spreadsheet-XLSX)
The author has abandoned the module to to a lack of interest in perl itself. He wrote the module when he needed to parse XLSX and there was no alternative available. It parses XML with regular expressions. That should say enough. Show quoted text
> Rather than wait for a fix in that module, I tried, and failed, to > modify the source so Spreadsheet::Read skip it and use ParseXLSX > instead.
ParseXLSX is already preferred over XLSX when installed since version 0.53 (29 Jan 2014). To disable Spreadsheet::XLSX completely: --8<--- --- Read.pm 2015-03-15 09:32:30.738033028 +0100 +++ Read.pm 2015-03-16 09:45:50.414152843 +0100 @@ -45,7 +45,6 @@ my @parsers = ( [ sxc => "Spreadsheet::ReadSXC", "0.20" ], [ xls => "Spreadsheet::ParseExcel", "0.34" ], [ xlsx => "Spreadsheet::ParseXLSX", "0.13" ], - [ xlsx => "Spreadsheet::XLSX", "0.13" ], [ prl => "Spreadsheet::Perl", "" ], # Helper modules -->8--- Show quoted text
> Is there some way to deprecate the use of Spreadsheet::XLSX in favor of a current, and functional, alternative? > > Environment: > > Spreadsheet::XLSX version 0.13 > Spreadsheet::ParseXLSX version 0.16 > Spreadsheet::ParseExcel version 0.65 > Spreadsheet::Read version 0.58 > > perl v5.20.1 for darwin (aka OS X)
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 490b

Message body not shown because it is not plain text.

As Spreadsheet::Read is just a wrapper module, this ticket - however useful and well-worded - is not about a bug in Spreadsheet::Read. This module cannot solve this particular problem for the parser below. Sometimes Spreadsheet::Read is able to find workarounds, but that is not the case here, so I can do nothing about it. Sorry to reject this ticket as is. BTW, I'd advice you to just uninstall Spreadsheet::XLSX completely.
I should have provided details on when I tried, and failed, to avoid using Spreadsheet::XLSX. I both commented it out in Spreadsheet/Read.pm and renamed Spreadsheet/XLSX.pm to hide it. In both cases, attempts to use Spreadsheet::Read subsequently failed. host% perl -MCarp=verbose ./parse_sheets.pl Parser for XLSX is not installed at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm line 415. Spreadsheet::Read::ReadData("/Users/scott/Downloads/bug_demo.xlsx", "strip", 3, "dtfmt", "yyyy-mm-dd", "parser", "xlsx", "debug", undef, ...) called at ./parse_sheets.pl line 158 On Mon Mar 16 04:59:16 2015, HMBRAND wrote: Show quoted text
> As Spreadsheet::Read is just a wrapper module, this ticket - however > useful and well-worded - is not about a bug in Spreadsheet::Read. This > module cannot solve this particular problem for the parser below. > > Sometimes Spreadsheet::Read is able to find workarounds, but that is > not the case here, so I can do nothing about it. > > Sorry to reject this ticket as is. > > BTW, I'd advice you to just uninstall Spreadsheet::XLSX completely.
Subject: Re: [rt.cpan.org #102794] Cannot process Google Drive .xlsx files
Date: Mon, 16 Mar 2015 13:41:08 +0100
To: bug-Spreadsheet-Read [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Mon, 16 Mar 2015 08:21:22 -0400, "Scott Bolte via RT" <bug-Spreadsheet-Read@rt.cpan.org> wrote: Show quoted text
> I should have provided details on when I tried, and failed, to avoid > using Spreadsheet::XLSX. I both commented it out in Spreadsheet/Read.pm > and renamed Spreadsheet/XLSX.pm to hide it. In both cases, attempts to > use Spreadsheet::Read subsequently failed.
As long as it is not able to load/use any of the preferred modules, it'll try the next supported version. The first one tried is the one I prefer. It should be obvious that Spreadsheet::Read is unable to load Spreadsheet::ParseXLSX Show quoted text
> host% perl -MCarp=verbose ./parse_sheets.pl > Parser for XLSX is not installed at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm line 415. > Spreadsheet::Read::ReadData("/Users/scott/Downloads/bug_demo.xlsx", "strip", 3, "dtfmt", "yyyy-mm-dd", "parser", "xlsx", "debug", undef, ...) called at ./parse_sheets.pl line 158
In which case it was unable to load Spreadsheet::ParseXLSX I think that message is pretty clear. I can change it to "Spreadsheet::Read is unable to load any of the supported XLSX parsers" but I don't think that adds real value. Does your module load and show the version? $ perl -MSpreadsheet::ParseXLSX -wE'say $Spreadsheet::ParseXLSX::VERSION' 0.16 -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 490b

Message body not shown because it is not plain text.

I do have Spreadsheet::ParseXLSX installed, both the stock 0.16 version and the pending replacement that works with the Google XLSX files. While the problem changes, there is a problem with both. First, here is the stripped down test case: host% cat parse_demo.pl #!/usr/bin/env perl use strict; use warnings; use Spreadsheet::Read; for my $module ( sort keys %INC ) { if ( $module =~ m/spreadsheet/i ) { printf qq{%-40s -> %s\n}, $module, $INC{$module}; } } ReadData(q{bug_demo.xlsx}); exit; host% perl -MSpreadsheet::ParseXLSX -wE'say $Spreadsheet::ParseXLSX::VERSION' 0.16 host% perl -MCarp=verbose parse_demo.pl Spreadsheet/ParseExcel.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel.pm Spreadsheet/ParseExcel/Cell.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Cell.pm Spreadsheet/ParseExcel/FmtDefault.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/FmtDefault.pm Spreadsheet/ParseExcel/Font.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Font.pm Spreadsheet/ParseExcel/Format.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Format.pm Spreadsheet/ParseExcel/Utility.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Utility.pm Spreadsheet/ParseExcel/Workbook.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Workbook.pm Spreadsheet/ParseExcel/Worksheet.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Worksheet.pm Spreadsheet/ParseXLSX.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm Spreadsheet/Read.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm Spreadsheet/ReadSXC.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ReadSXC.pm Can't call method "first_child" on an undefined value at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm line 455. That demonstrates the XML element ordering issue that crops up with Google's interpretation of XLSX doc format. Now I replace the stock version with the not yet released version from https://github.com/doy/spreadsheet-parsexlsx/tree/master/lib/Spreadsheet host% perldoc -l Spreadsheet::ParseXLSX /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm host% sudo mv -n /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm{,-0.16} host% curl --remote-name https://raw.githubusercontent.com/doy/spreadsheet-parsexlsx/master/lib/Spreadsheet/ParseXLSX.pm % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 30821 100 30821 0 0 97935 0 --:--:-- --:--:-- --:--:-- 102k host% md5 ParseXLSX.pm MD5 (ParseXLSX.pm) = 4db04805888cb8f1a56295e201637bd7 host% sudo cp -p ParseXLSX.pm /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm host% perl -MCarp=verbose parse_demo.pl Spreadsheet/ParseExcel.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel.pm Spreadsheet/ParseExcel/Cell.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Cell.pm Spreadsheet/ParseExcel/FmtDefault.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/FmtDefault.pm Spreadsheet/ParseExcel/Font.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Font.pm Spreadsheet/ParseExcel/Format.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Format.pm Spreadsheet/ParseExcel/Utility.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Utility.pm Spreadsheet/ParseExcel/Workbook.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Workbook.pm Spreadsheet/ParseExcel/Worksheet.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Worksheet.pm Spreadsheet/ParseXLSX.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm Spreadsheet/Read.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm Spreadsheet/ReadSXC.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ReadSXC.pm Parser for XLSX is not installed at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm line 415. Spreadsheet::Read::ReadData("bug_demo.xlsx") called at parse_demo.pl line 13 So even though Spreadsheet::ParseXLSX is successfully loaded according to %INC, Spreadsheet::Read thinks otherwise. I'm not sure why.
Subject: Re: [rt.cpan.org #102794] Cannot process Google Drive .xlsx files
Date: Tue, 17 Mar 2015 00:24:28 +0100
To: bug-Spreadsheet-Read [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Mon, 16 Mar 2015 18:37:10 -0400, "Scott Bolte via RT" <bug-Spreadsheet-Read@rt.cpan.org> wrote: Show quoted text
> Queue: Spreadsheet-Read > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=102794 > > > I do have Spreadsheet::ParseXLSX installed, both the stock 0.16 version and the pending replacement that works with the Google XLSX files. While the problem changes, there is a problem with both.
Show quoted text
> host% perl -MCarp=verbose parse_demo.pl > Spreadsheet/ParseExcel.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel.pm > Spreadsheet/ParseExcel/Cell.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Cell.pm > Spreadsheet/ParseExcel/FmtDefault.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/FmtDefault.pm > Spreadsheet/ParseExcel/Font.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Font.pm > Spreadsheet/ParseExcel/Format.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Format.pm > Spreadsheet/ParseExcel/Utility.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Utility.pm > Spreadsheet/ParseExcel/Workbook.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Workbook.pm > Spreadsheet/ParseExcel/Worksheet.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseExcel/Worksheet.pm > Spreadsheet/ParseXLSX.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ParseXLSX.pm > Spreadsheet/Read.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm > Spreadsheet/ReadSXC.pm -> /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/ReadSXC.pm > Parser for XLSX is not installed at /opt/local/lib/perl5/site_perl/5.20/Spreadsheet/Read.pm line 415. > Spreadsheet::Read::ReadData("bug_demo.xlsx") called at parse_demo.pl line 13 > > So even though Spreadsheet::ParseXLSX is successfully loaded according to %INC, Spreadsheet::Read thinks otherwise. I'm not sure why.
Can I get the failing xlsx? -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 490b

Message body not shown because it is not plain text.

The failing xlsx file was attached to my first report.
The reason Spreadsheet::Read cannot load Spreadsheet::ParseXLSX is because the git checkout does not return a VERSION $ perl -MSpreadsheet::ParseXLSX -wE'say Spreadsheet::ParseXLSX->VERSION' Use of uninitialized value in say at -e line 1. Causing the minimal requirement to fail (and then fallback to the next alternative) $ perl -MSpreadsheet::ParseXLSX -wE'say Spreadsheet::ParseXLSX->VERSION("0.13")' Spreadsheet::ParseXLSX does not define $Spreadsheet::ParseXLSX::VERSION--version check failed at -e line 1. If I manually add a «our $VERSION = "0.16";» in ParseXLSX.pm, all goes as expected: $ perl -Mblib examples/xlscat -v9 -i sandbox/rt102794.xlsx ReadData (sandbox/rt102794.xlsx, debug 0 clip 1); [ { error => undef, parser => 'Spreadsheet::ParseXLSX', sheet => { 'Mar \'15' => 1 }, sheets => 1, type => 'xlsx', version => '0.16' }, { A1 => 'Alpha', A2 => 10, B1 => 'Beta', B2 => 20, C1 => '', attr => [], cell => [ [], [ undef, 'Alpha', 10 ], [ undef, 'Beta', 20 ], [ undef, '' ] ], label => 'Mar \'15', maxcol => 2, maxrow => 2, merged => [] } ] Opened sandbox/rt102794.xlsx with 1 sheets Opening sheet 1 ... { A1 => 'Alpha', A2 => 10, B1 => 'Beta', B2 => 20, C1 => '', attr => [], cell => [ [], [ undef, 'Alpha', 10 ], [ undef, 'Beta', 20 ], [ undef, '' ] ], label => 'Mar \'15', maxcol => 2, maxrow => 2, merged => [] } sandbox/rt102794.xlsx - 01: [ Mar '15 ] 2 Cols, 2 Rows
Sweet, sweet, sweet -- Thank you! Sorry I did not figure that out myself.