This queue is for tickets about the Spreadsheet-ParseExcel CPAN distribution.
Maintainer(s)' notes
If you are reporting a bug in Spreadsheet::ParseExcel here are some pointers
1) State the issues as clearly and as concisely as possible. A simple program or Excel test file (see below) will often explain the issue better than a lot of text.
2) Provide information on your system, version of perl and module versions. The following program will generate everything that is required. Put this information in your bug report.
#!/usr/bin/perl -w
print "\n Perl version : $]";
print "\n OS name : $^O";
print "\n Module versions: (not all are required)\n";
my @modules = qw(
Spreadsheet::ParseExcel
Scalar::Util
Unicode::Map
Spreadsheet::WriteExcel
Parse::RecDescent
File::Temp
OLE::Storage_Lite
IO::Stringy
);
for my $module (@modules) {
my $version;
eval "require $module";
if (not $@) {
$version = $module->VERSION;
$version = '(unknown)' if not defined $version;
}
else {
$version = '(not installed)';
}
printf "%21s%-24s\t%s\n", "", $module, $version;
}
__END__
3) Upgrade to the latest version of Spreadsheet::ParseExcel (or at least test on a system with an upgraded version). The issue you are reporting may already have been fixed.
4) Create a small example program that demonstrates your problem. The program should be as small as possible. A few lines of codes are worth tens of lines of text when trying to describe a bug.
5) Supply an Excel file that demonstrates the problem. This is very important. If the file is big, or contains confidential information, try to reduce it down to the smallest Excel file that represents the issue. If you don't wish to post a file here then send it to me directly: jmcnamara@cpan.org
6) Say if the test file was created by Excel, OpenOffice, Gnumeric or something else. Say which version of that application you used.
7) If you are submitting a patch you should check with the maintainer whether the issue has already been patched or if a fix is in the works. Patches should be accompanied by test cases.
Asking a question
If you would like to ask a more general question there is the Spreadsheet::ParseExcel Google Group.
Owner: |
Nobody in particular
|
Requestors: |
divanov [...] creditreform.bg
|
Cc: |
|
AdminCc: |
|
|
Severity: |
Wishlist |
Broken in: |
(no value)
|
Fixed in: |
0.59 |
|
Fri Apr 29 09:58:46 2005
Guest - Ticket created
Hi,
I have problems resembling to some extent #563.
I am writing a program that processes a batch of excel files using Spreadsheet::ParseExcel (0.2603, from Debian/unstable). It basically does the following (see attached testcase.pl):
foreach(@filename)
{
my $wBook = $parser->parse($_);
... work with $wBook ...
}
This increases memory consumption on each iteration. This is due to cyclic references causing Workbook/Worksheet/Cell object to never be deallocated (until global destruction). Workbook references Worksheet, which references Cell, which references Workbook.
The only way to break the refcycle is to do it explicitly.
undef $wBook->{Worksheet};
after each loop stops increasiing memory consumption.
My proposal is to add some method like Close(), Free() or something of your taste that breaks the loop for the user.
Adding a paragraph about it in the manpage would help the poor souls that have to do batch processing of excel files :-)
Thanks for writing Spreadsheet::ParseExcel. It is invaluable for my work.
--
dam
Mon Jan 30 11:55:03 2006
Guest - Correspondence added
I don't see "undef $wBook->{Worksheet};" helping with breaking the
cyclic reference or the memory consumption due to batch processing (at
least with my test case).
I processed about 100 files, 300K each The memory consumption while
processing the first file was 10M and when it reached to the 100th file
it was about 25M. The results were same irrespective of whether I
undef'ed the sheet or not.
On Fri Apr 29 09:58:46 2005, guest wrote:
Show quoted text> Hi,
>
> I have problems resembling to some extent #563.
>
> I am writing a program that processes a batch of excel files using
> Spreadsheet::ParseExcel (0.2603, from Debian/unstable). It
> basically does the following (see attached testcase.pl):
>
> foreach(@filename)
> {
> my $wBook = $parser->parse($_);
> ... work with $wBook ...
> }
>
> This increases memory consumption on each iteration. This is due to
> cyclic references causing Workbook/Worksheet/Cell object to never
> be deallocated (until global destruction). Workbook references
> Worksheet, which references Cell, which references Workbook.
>
> The only way to break the refcycle is to do it explicitly.
>
> undef $wBook->{Worksheet};
> after each loop stops increasiing memory consumption.
>
> My proposal is to add some method like Close(), Free() or something of
> your taste that breaks the loop for the user.
>
> Adding a paragraph about it in the manpage would help the poor souls
> that have to do batch processing of excel files :-)
>
>
> Thanks for writing Spreadsheet::ParseExcel. It is invaluable for my
> work.
> --
> dam
Mon Jan 30 11:55:04 2006
The RT System itself - Status changed from 'new' to 'open'
Mon Jan 30 13:08:57 2006
divanov [...] creditreform.bg - Correspondence added
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Guest via RT wrote:
Show quoted text> I don't see "undef $wBook->{Worksheet};" helping with breaking the
> cyclic reference or the memory consumption due to batch processing (at
> least with my test case).
>
> I processed about 100 files, 300K each The memory consumption while
> processing the first file was 10M and when it reached to the 100th file
> it was about 25M. The results were same irrespective of whether I
> undef'ed the sheet or not.
Can you try to present a reproducible test case?
dam
- --
Damyan Ivanov Creditreform Bulgaria
divanov@creditreform.bg
http://www.creditreform.bg/
phone: +359(2)928-2611, 929-3993 fax: +359(2)920-0994
mob. +359(88)856-6067 dam@jabber.minus273.org/Gaim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird -
http://enigmail.mozdev.org
iD8DBQFD3lX8Hqjlqpcl9jsRAuZCAKCcQ4Qyl4Cfcib/MJ/yGnLgiGmhmwCgjwtX
O+IIX/HLOIkIUsks0yAI+xo=
=qhXZ
-----END PGP SIGNATURE-----
Mon Feb 17 00:22:55 2014
tlhackque [...] yahoo.com - Correspondence added
As this is still open..
These shouldn't be that hard to find & fix.
Usually, just adding a
sub DESTROY {
undef ...
}
to each module will do it when the object goes out of scope. (Or undef it.) It can be a bit trickier to weaken one of the links - Scalar::Util::weaken is the method, but copies of a weak link are strong...
To identify the cycles, try Devel::Cycle.
I've tried a 1/2 dozen spreadsheets - some fairly complex - and it's not identified any.
But I do worry - I'm considering using Spreadsheet::ParseExcel in a mod_perl world.
Here's a simple shell that should identify cycles if they are real. If not, a reproducible test case would help.
You can run this over many files with a shell script like
for F in *.xls; do ./cycletest $F ; done
#!/usr/bin/perl
use warnings;
use strict;
use Spreadsheet::ParseExcel;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
use Devel::Cycle;
my $ss=Spreadsheet::ParseExcel->new;
unless( $ss ) {
print STDERR "Init failed $ARGV[0]\n";
exit;
}
my $wb=$ss->parse($ARGV[0]);
unless( $wb ) {
print STDERR "Parse failed $ARGV[0]\n";
exit;
}
my $ws=$wb->worksheet(0);
# These are redundant as find_cycle will follow the pointers.
print STDERR "Checking $ARGV[0] SS...";
find_cycle($ss);
print STDERR "Checking WB...";
find_cycle($wb);
print STDERR "Checking WS...";
find_cycle($ws);
print STDERR "Done\n\n";
exit;
Thu Feb 27 13:34:48 2014
DOUGW [...] cpan.org - Correspondence added
On Fri Apr 29 09:58:46 2005, guest wrote:
Show quoted text>
> This increases memory consumption on each iteration. This is due to
> cyclic references causing Workbook/Worksheet/Cell object to never be
> deallocated (until global destruction). Workbook references Worksheet,
Pretty sure this has been fixed for some time now.
Thu Feb 27 13:34:49 2014
DOUGW [...] cpan.org - Status changed from 'open' to 'resolved'