Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 23370
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: MARKOV [...] cpan.org
Requestors: ANDK [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.068
Fixed in: (no value)



Subject: Memory hungry MBox
Today I wrote this innocent looking code: my $mbox = Mail::Box::Mbox->new(folder => $mboxfile) || die; my $messages = $mbox->messages; for my $i (0..$messages-1) { print "i[$i/$messages]\n"; no warnings "uninitialized"; my $message; open my $fh, ">", \$message or die; $mbox->message($i)->print($fh); # do something with $message; } Memory starvation. The process grew to > 1 GB quickly even when I ran it exactly as above (without actually doing something with $message). Granted, the mbox file that I had to process was 500 MB big but still this looks like a bug in Mail::Box to me. What do you think? Thanks!
Subject: Re: [rt.cpan.org #23370] Memory hungry MBox
Date: Thu, 16 Nov 2006 16:44:46 +0100
To: Andreas Koenig via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Andreas Koenig via RT (bug-Mail-Box@rt.cpan.org) [061116 11:43]: Show quoted text
> Transaction: Ticket created by ANDK > Queue: Mail-Box > Subject: Memory hungry MBox > > my $mbox = Mail::Box::Mbox->new(folder => $mboxfile) || die; > my $messages = $mbox->messages; > for my $i (0..$messages-1) { > print "i[$i/$messages]\n"; > no warnings "uninitialized"; > my $message; > open my $fh, ">", \$message or die; > $mbox->message($i)->print($fh); > # do something with $message; > }
I would have written that as: my $mgr = Mail::Box::Manager->new; my $mbox = $mgr->open($mboxfile) or die; foreach my $msg ($folder->messages) { print "progress: ".$msg->seqnr/$folder->nrMessages."\n"; my $text = $msg->body->decoded->string; # do something with $text; } Show quoted text
> Memory starvation. The process grew to > 1 GB quickly even when I ran it > exactly as above (without actually doing something with $message). > Granted, the mbox file that I had to process was 500 MB big but still > this looks like a bug in Mail::Box to me. > What do you think?
The "open" will not consume too much memory: for mbox folders it will run through the file and collect headers with a minimal number of fields. But then, when you start looking inside the message, the memory use will be considerable. Don't forget that each message builds an expensive header structure, and with multiparts even quite a few of them. There are tests in the test-set which check for memory-leaks (using weak- links), and I am not aware of any. There are two approaches to avoid the memory use explosion: 1) reopen the folder for each few-hundred processed messages 2) use $msg->destruct ... the memory of the message will be cleared. But then: do not open the folder with write permission. This is typically a place where the memory hungry perl variables bite. Unless you have more facts to suspect a leak, I think there isn't. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #23370] Memory hungry MBox
Date: Fri, 17 Nov 2006 08:49:34 +0100
To: bug-Mail-Box [...] rt.cpan.org
From: andreas.koenig.gmwojprw [...] franz.ak.mind.de (Andreas J. Koenig)
Show quoted text
>>>>> On Thu, 16 Nov 2006 11:59:39 -0500, "Mark Overmeer via RT" <bug-Mail-Box@rt.cpan.org> said:
Show quoted text
Show quoted text
movr> I would have written that as: movr> my $mgr = Mail::Box::Manager->new; movr> my $mbox = $mgr->open($mboxfile) or die; movr> foreach my $msg ($folder->messages) movr> { print "progress: ".$msg->seqnr/$folder->nrMessages."\n"; movr> my $text = $msg->body->decoded->string; movr> # do something with $text; movr> }
Much prettier, thanks. I didn't find out how to serialize a message except via print. Show quoted text
movr> The "open" will not consume too much memory: for mbox folders it will movr> run through the file and collect headers with a minimal number of movr> fields. But then, when you start looking inside the message, the movr> memory use will be considerable. Don't forget that each message movr> builds an expensive header structure, and with multiparts even quite movr> a few of them.
Show quoted text
movr> There are tests in the test-set which check for memory-leaks (using weak- movr> links), and I am not aware of any.
Show quoted text
movr> There are two approaches to avoid the memory use explosion: movr> 1) reopen the folder for each few-hundred processed messages movr> 2) use $msg->destruct ... the memory of the message will movr> be cleared. But then: do not open the folder with write movr> permission.
Thanks, the second option seems to help a lot and is perfectly well suited for me. The process now grows to 640 MB for a 440 MB mbox and it is faster too. This will enable me to do the jobs of converting these mailboxes at hand. Thanks much. I'm still surprised that it's so much. Seems like you're holding the whole mbox in memory despite you have a filehandle for it. But that's OK for a perl program. Maybe worth a mention in the manpage? Show quoted text
movr> This is typically a place where the memory hungry perl variables movr> bite. Unless you have more facts to suspect a leak, I think there movr> isn't.
It's your call to decide on that. I wasn't even suspecting a leak in the first place, just simply complaining about use of too much memory. Thanks much for your help on this. -- andreas
5.10 is more memory efficient ;-) Yes, it is, but certainly not suffiently; Mail::Box (as most mail-box handlers) will require a lot more memory to administer the mail headers than the size of the source file.