Skip Menu |

This queue is for tickets about the Spreadsheet-XLSX CPAN distribution.

Report information
The Basics
Id: 66516
Status: open
Priority: 0/
Queue: Spreadsheet-XLSX

People
Owner: Nobody in particular
Requestors: giulioo [...] pobox.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Using much more RAM than Spreadsheet-ParseExcel
Date: Thu, 10 Mar 2011 11:51:02 +0100
To: bug-Spreadsheet-XLSX [...] rt.cpan.org
From: Giulio <giulioo [...] gmail.com>
Spreadsheet-XLSX-0.13 - perl-5.8.8 - Linux RHEL5.x Test file has 65000 rows with 4 columns each. 1) Test file in Excel 2007 .xlsx format: Parsing with Spreadsheet-XLSX-0.13: Process RAM usages stabilizes at 250 MB 2) Test file in BIFF .xls format Spreadsheet-ParseExcel-0.32 using Parse(): Process RAM usages stabilizes at 160 MB Spreadsheet-ParseExcel-0.32 using new()/CellHandler: Process RAM usages stabilizes at 60 MB So right now it's 60 MB for .xls and 250 MB for .xlsx. I understand .xlsx is more verbose and so much more data has to be kept in memory in order to be analyzed; however, would it be technically feasable to add something like CellHandler Spreadsheet-XLSX? Thanks
RT-Send-CC: do [...] eludia.ru
On Thu Mar 10 05:54:28 2011, giulioo@pobox.com wrote: (snip) Show quoted text
> > So right now it's 60 MB for .xls and 250 MB for .xlsx. > > I understand .xlsx is more verbose and so much more data has to be kept in > memory in order to be analyzed; however, would it be technically
feasable to Show quoted text
> add something like CellHandler Spreadsheet-XLSX?
Patch attached to implement CellHandler and NotSetCell (and optionally implement converter in the optional options hash). It looks like this library has been unmaintained for a while. Note to author: Are you still around and maintaining?
Subject: XLSX.pm.diff
--- XLSX.pm Sun May 16 02:07:33 2010 +++ XLSX.pm Thu Jun 7 15:41:03 2012 @@ -17,7 +17,15 @@ sub new { - my ($class, $filename, $converter) = @_; + my ($class, $filename, @options) = @_; + + my $converter; + + $converter = shift @options if @options % 2; + my %opts = @options; + $converter ||= $opts{Converter}; + my $cell_handler = $opts{CellHandler}; + my $not_set_cell = $opts{NotSetCell}; my $self = {}; @@ -110,9 +118,9 @@ my ($tag, $other) = ($1, $'); - my @pairs = split /\" /, $other; - $tag eq 'sheet' or next; + + my @pairs = split /\" /, $other; my $sheet = { MaxRow => 0, @@ -145,6 +153,7 @@ $self -> {Worksheet} = \@Worksheet; + my $sheet_count = 0; foreach my $sheet (@Worksheet) { my $member_sheet = $self -> {zip} -> memberNamed ("xl/$sheet->{path}") or next; @@ -211,7 +220,8 @@ $cell->{Type}="Text"; $cell->{Val}=$cell->{_Value}; } - $sheet -> {Cells} [$row] [$col] = $cell; + $cell_handler->($oBook, $sheet_count, $row, $col, $cell) if $cell_handler; + $sheet -> {Cells} [$row] [$col] = $cell unless $not_set_cell; } } @@ -219,7 +229,9 @@ $sheet -> {MinRow} = 0 if $sheet -> {MinRow} > $sheet -> {MaxRow}; $sheet -> {MinCol} = 0 if $sheet -> {MinCol} > $sheet -> {MaxCol}; - } + } continue { + $sheet_count++; + } foreach my $stys (keys %style_info){ } bless ($self, $class); @@ -278,6 +290,25 @@ but not all. It populates the classes from Spreadsheet::ParseExcel for interoperability; including Workbook, Worksheet, and Cell. +The CellHandler and NotSetCell options of Spreadsheet::ParseExcel are now implemented: + + Spreadsheet::XLSX->new ('test.xlsx', %options); + +Where %options may have the following keys; + +=over + +=item Converter + +=item CellHandler + +=item NotSetCell + +=back + +Converter may be optionally specified in the hash or as the third argument to new(). +CellHandler and NotSetCell work as they do in Spreadsheet::ParseExcel. + =head1 SEE ALSO =over 2
On Thu Jun 07 18:55:45 2012, DOUGW wrote: Show quoted text
> > Patch attached to implement CellHandler and NotSetCell (and optionally > implement converter in the optional options hash).
Forgot ParseAbort(). Add after "if $cell_handler;": return $oBook if defined $oBook->{_ParseAbort};
On Thu Jun 07 19:36:18 2012, DOUGW wrote: Show quoted text
> On Thu Jun 07 18:55:45 2012, DOUGW wrote:
> > > > Patch attached to implement CellHandler and NotSetCell (and optionally > > implement converter in the optional options hash).
> > Forgot ParseAbort(). > > Add after "if $cell_handler;": > return $oBook if defined $oBook->{_ParseAbort};
No, add it after "unless $not_set_cell;"
From: giulioo [...] pobox.com
On Thu Jun 07 19:40:41 2012, DOUGW wrote: Show quoted text
> On Thu Jun 07 19:36:18 2012, DOUGW wrote:
> > On Thu Jun 07 18:55:45 2012, DOUGW wrote:
> > > > > > Patch attached to implement CellHandler and NotSetCell (and optionally > > > implement converter in the optional options hash).
With a new 65.000 rows .xlsx test file (don't have the old one) memory stabilizes at 182MB instead of 264MB, -31%. Thanks!