Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Spreadsheet-ParseExcel CPAN distribution.

Maintainer(s)' notes

If you are reporting a bug in Spreadsheet::ParseExcel here are some pointers

1) State the issues as clearly and as concisely as possible. A simple program or Excel test file (see below) will often explain the issue better than a lot of text.

2) Provide information on your system, version of perl and module versions. The following program will generate everything that is required. Put this information in your bug report.

    #!/usr/bin/perl -w

    print "\n    Perl version   : $]";
    print "\n    OS name        : $^O";
    print "\n    Module versions: (not all are required)\n";

    my @modules = qw(
                      Spreadsheet::ParseExcel
                      Scalar::Util
                      Unicode::Map
                      Spreadsheet::WriteExcel
                      Parse::RecDescent
                      File::Temp
                      OLE::Storage_Lite
                      IO::Stringy
                    );

    for my $module (@modules) {
        my $version;
        eval "require $module";

        if (not $@) {
            $version = $module->VERSION;
            $version = '(unknown)' if not defined $version;
        }
        else {
            $version = '(not installed)';
        }

        printf "%21s%-24s\t%s\n", "", $module, $version;
    }

    __END__

3) Upgrade to the latest version of Spreadsheet::ParseExcel (or at least test on a system with an upgraded version). The issue you are reporting may already have been fixed.

4) Create a small example program that demonstrates your problem. The program should be as small as possible. A few lines of codes are worth tens of lines of text when trying to describe a bug.

5) Supply an Excel file that demonstrates the problem. This is very important. If the file is big, or contains confidential information, try to reduce it down to the smallest Excel file that represents the issue. If you don't wish to post a file here then send it to me directly: jmcnamara@cpan.org

6) Say if the test file was created by Excel, OpenOffice, Gnumeric or something else. Say which version of that application you used.

7) If you are submitting a patch you should check with the maintainer whether the issue has already been patched or if a fix is in the works. Patches should be accompanied by test cases.

Asking a question

If you would like to ask a more general question there is the Spreadsheet::ParseExcel Google Group.

Report information
The Basics
Id: 12464
Status: resolved
Priority: 0/
Queue: Spreadsheet-ParseExcel

People
Owner: Nobody in particular
Requestors: divanov [...] creditreform.bg
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: 0.59



Subject: New method to break cyclic references
Hi, I have problems resembling to some extent #563. I am writing a program that processes a batch of excel files using Spreadsheet::ParseExcel (0.2603, from Debian/unstable). It basically does the following (see attached testcase.pl): foreach(@filename) { my $wBook = $parser->parse($_); ... work with $wBook ... } This increases memory consumption on each iteration. This is due to cyclic references causing Workbook/Worksheet/Cell object to never be deallocated (until global destruction). Workbook references Worksheet, which references Cell, which references Workbook. The only way to break the refcycle is to do it explicitly. undef $wBook->{Worksheet}; after each loop stops increasiing memory consumption. My proposal is to add some method like Close(), Free() or something of your taste that breaks the loop for the user. Adding a paragraph about it in the manpage would help the poor souls that have to do batch processing of excel files :-) Thanks for writing Spreadsheet::ParseExcel. It is invaluable for my work. -- dam
I don't see "undef $wBook->{Worksheet};" helping with breaking the cyclic reference or the memory consumption due to batch processing (at least with my test case). I processed about 100 files, 300K each The memory consumption while processing the first file was 10M and when it reached to the 100th file it was about 25M. The results were same irrespective of whether I undef'ed the sheet or not. On Fri Apr 29 09:58:46 2005, guest wrote: Show quoted text
> Hi, > > I have problems resembling to some extent #563. > > I am writing a program that processes a batch of excel files using > Spreadsheet::ParseExcel (0.2603, from Debian/unstable). It > basically does the following (see attached testcase.pl): > > foreach(@filename) > { > my $wBook = $parser->parse($_); > ... work with $wBook ... > } > > This increases memory consumption on each iteration. This is due to > cyclic references causing Workbook/Worksheet/Cell object to never > be deallocated (until global destruction). Workbook references > Worksheet, which references Cell, which references Workbook. > > The only way to break the refcycle is to do it explicitly. > > undef $wBook->{Worksheet}; > after each loop stops increasiing memory consumption. > > My proposal is to add some method like Close(), Free() or something of > your taste that breaks the loop for the user. > > Adding a paragraph about it in the manpage would help the poor souls > that have to do batch processing of excel files :-) > > > Thanks for writing Spreadsheet::ParseExcel. It is invaluable for my > work. > -- > dam
Subject: Re: [rt.cpan.org #12464] New method to break cyclic references
Date: Mon, 30 Jan 2006 20:07:56 +0200
To: bug-Spreadsheet-ParseExcel [...] rt.cpan.org
From: Damyan Ivanov <divanov [...] creditreform.bg>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guest via RT wrote: Show quoted text
> I don't see "undef $wBook->{Worksheet};" helping with breaking the > cyclic reference or the memory consumption due to batch processing (at > least with my test case). > > I processed about 100 files, 300K each The memory consumption while > processing the first file was 10M and when it reached to the 100th file > it was about 25M. The results were same irrespective of whether I > undef'ed the sheet or not.
Can you try to present a reproducible test case? dam - -- Damyan Ivanov Creditreform Bulgaria divanov@creditreform.bg http://www.creditreform.bg/ phone: +359(2)928-2611, 929-3993 fax: +359(2)920-0994 mob. +359(88)856-6067 dam@jabber.minus273.org/Gaim -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD3lX8Hqjlqpcl9jsRAuZCAKCcQ4Qyl4Cfcib/MJ/yGnLgiGmhmwCgjwtX O+IIX/HLOIkIUsks0yAI+xo= =qhXZ -----END PGP SIGNATURE-----
From: tlhackque [...] yahoo.com
As this is still open.. These shouldn't be that hard to find & fix. Usually, just adding a sub DESTROY { undef ... } to each module will do it when the object goes out of scope. (Or undef it.) It can be a bit trickier to weaken one of the links - Scalar::Util::weaken is the method, but copies of a weak link are strong... To identify the cycles, try Devel::Cycle. I've tried a 1/2 dozen spreadsheets - some fairly complex - and it's not identified any. But I do worry - I'm considering using Spreadsheet::ParseExcel in a mod_perl world. Here's a simple shell that should identify cycles if they are real. If not, a reproducible test case would help. You can run this over many files with a shell script like for F in *.xls; do ./cycletest $F ; done #!/usr/bin/perl use warnings; use strict; use Spreadsheet::ParseExcel; use Data::Dumper; $Data::Dumper::Sortkeys=1; use Devel::Cycle; my $ss=Spreadsheet::ParseExcel->new; unless( $ss ) { print STDERR "Init failed $ARGV[0]\n"; exit; } my $wb=$ss->parse($ARGV[0]); unless( $wb ) { print STDERR "Parse failed $ARGV[0]\n"; exit; } my $ws=$wb->worksheet(0); # These are redundant as find_cycle will follow the pointers. print STDERR "Checking $ARGV[0] SS..."; find_cycle($ss); print STDERR "Checking WB..."; find_cycle($wb); print STDERR "Checking WS..."; find_cycle($ws); print STDERR "Done\n\n"; exit;
On Fri Apr 29 09:58:46 2005, guest wrote: Show quoted text
> > This increases memory consumption on each iteration. This is due to > cyclic references causing Workbook/Worksheet/Cell object to never be > deallocated (until global destruction). Workbook references Worksheet,
Pretty sure this has been fixed for some time now.