Thanks for the testing, I was able to fix the memory leak by wrapping expat with
Object::Destroyer; it's a hack but it works pretty well. I've pushed out version 0.54 into CPAN,
it should fix the issue.
I'm going to close the bug, please add more notes if something else crops up. Thanks again
for the bug reports and testing!
Tyler
On Sat Nov 15 16:51:18 2008, codepoet@umiacs.umd.edu wrote:
Show quoted text> Tyler,
>
> When I run the script you sent me the memory usage does not grow. I
> think
> this is because your test script iterates over all articles in the
> dump
> file. However if you don't iterate over all the articles there looks
> like
> memory leaks. Take al ook at the following script for example:
>
> #! /usr/bin/perl
>
> use warnings;
> use strict;
>
> use Parse::MediaWikiDump;
>
> my $file = shift @ARGV;
>
> for (;;) {
> my $p = Parse::MediaWikiDump::Pages->new($file);
> #while ($p->next) { }
> }
>
> 0;
>
>
> As is the script causes memory leaks somewhere. However if you
> uncomment
> the while() line, the memory usage doesn't grow.
>
> Mike
>
>
> On Fri, 14 Nov 2008, Tyler Riddle via RT wrote:
>
> not able to. I used the
> > following test program:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> > use Parse::MediaWikiDump;
> >
> > $| = 1;
> > print '';
> >
> > my $dump_count = 0;
> > my $article_count = 0;
> > my $file = shift(@ARGV);
> >
> > die "must specify dump file" unless defined $file;
> >
> > while(1) {
> > my $d = Parse::MediaWikiDump::Pages->new($file);
> >
> > $dump_count++;
> >
> > while(defined($d->next)) {
> > $article_count++;
> > print "\tdumps:$dump_count
> articles:$article_count\r";
> > }
> >
> > print "\tdumps:$dump_count articles:$article_count\r";
> > }
> >
> >
> > I ran that program on the test dump file that lives at
> t/pages_test.xml and on the Simple
> > English Wikipedia dump files at
>
http://download.wikimedia.org/simplewiki/20081029/simplewiki-20081029-
> pages-
> > articles.xml.bz2 and I never saw perl go over 4 megs of ram used
> even after it processed
> > hundreds of thousands of articles. Can you please use that test
> program and check the
> > memory consumption while it's running and let me know what the
> results are?
> >
> > Tyler
> >
> >
> > On Wed Nov 12 07:38:58 2008, TRIDDLE wrote:
> >> Moving forward then; I'm glad to hear that got one problem solved.
> it
> >> still seems it's leaking
> >> RAM though, I'll check into that too.
> >>
> >> Thanks for the reports!
> >>
> >> Tyler
> >>
> >> On Tue Nov 11 23:57:24 2008, codepoet@umiacs.umd.edu wrote:
> >>> Hi Tyler,
> >>>
> >>> Thanks for getting back to me. I tried it and was able to open up
> >> to
> >>> about 27000 xmls before it gave me an "out of memory" message.
> But,
> >>> this
> >>> is more than reasonable for my usage.
> >>>
> >>> Mike
> >>>
> >>> On Wed, 29 Oct 2008, Tyler Riddle via RT wrote:
> >>>
> >> references.
> >>> The event handlers were
> >>>> updated to use a small subset of the variables available to the
> >>> instantiated
> >>>> Parse::MediaWikiDump::Pages object which removed the circular
> >>> references. I ran some tests
> >>>> and was able to open and close dump files more times than the
> >>> maximum number of open
> >>>> filehandles.
> >>>>
> >>>> Can you please test these changes before I publish them? The
> >> release
> >>> candidate is attached.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Tyler
> >>>>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >