Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 55990
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: bram [...] cs.queensu.ca
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 2.093
Fixed in: (no value)



Subject: Memory leak when invoking messages like numberOfMessages() and endTimeEstimate() on a thread object
Hi, When processing the mbox file at http://www.spinics.net/lists/kernel/mbox/0912.mbox.gz with Mail::Box 2.093 (Perl v5.10.0 built for x86_64-linux-gnu-thread-multi on 2.6.31-19- generic #56-Ubuntu SMP Thu Jan 28 02:39:34 UTC 2010 x86_64 GNU/Linux) to extract all threads, I keep on getting a huge memory leak that quickly grows to consume any RAM it can get and crashes my machine. In particular, the leak only happens after construction of email threads when invoking methods such as numberOfMessages() and endTimeEstimate() that require traversal through all messages of a thread. Messages like startTimeEstimate() do not cause a problem. A quick hack to avoid the leak, is to remove all "References:" lines from the mbox file, but this makes the thread construction less precise. I attached an example program exhibiting the bug (I added a comment before the offending function calls): gunzip 0912.mbox.gz ; ./bug.pl 0912.mbox > test.txt I'd be happy to provide more information, if needed. Kind regards, Bram Adams
Subject: bug.pl
#!/usr/bin/perl my @args=@ARGV; use Time::Local; use Mail::Box::Manager; #process an mbox file to reconstruct threads my @folders=(); my $mgr = Mail::Box::Manager->new(timespan => 'EVER'); for my $arg (@args){ print STDERR "Pushing ${arg}\n"; push(@folders,$mgr->open(folder => $arg)); } print STDERR "Extracting threads...\n"; my $threads = $mgr->threads(folders => \@folders); print STDERR "Extracting done...\n"; my @sorted_threads=$threads->all; my $i=1; my $total_nr=$#sorted_threads+1; foreach my $thread (@sorted_threads) { my $start=$thread->startTimeEstimate; my $start_nice=construct_date($start); #following three lines cause memory leak, unless References: lines are removed from mbox files!!! my $end=$thread->endTimeEstimate; my $end_nice=construct_date($end); my $number=$thread->numberOfMessages; print $thread->threadToString; print "\n"; my $email=$thread->message; unless($email->isDummy){ my @froms=$email->from; my $sender=$froms[0]->address(); my $subject=$email->subject; $subject =~ s/\s+/ /g; $subject =~ s/^\[.+\][ :]//; print "${sender},\"${subject}\",${start},${end},${number}\n"; } } sub construct_date{ my ($epoch)=@_; my @months = ("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"); my ($sec, $min, $hour, $day,$month,$year) = (localtime($epoch))[0,1,2,3,4,5,6]; # You can use 'gmtime' for GMT/UTC dates instead of 'localtime' return "".$months[$month]." ".$day." ".$hour.":".$min.":".$sec." ".($year+1900); }
On Fri Mar 26 23:45:20 2010, bramadams wrote: Show quoted text
> with Mail::Box 2.093 > threads, I keep on getting a huge memory leak that quickly grows to > consume any RAM it can > get and crashes my machine.
Sorry, I completely missed your bug-report. Tracking down memomy leaks is very difficult. The MailBox regression tests try to check that all "weaken" calls work as expected. But you never can be sure that there are no circular references at all. However in your case, I expect you see a different aspect of the code, although I do not really know why the crash depends on the received lines ;-) When you open a Mail::Box, it only scans the files for a rather small index. Only when you start using the messages, it will load them into memory. Collection "Received" headers will pull-in the header of the message. Once pulled-in, it will stay there until the folder is closed. The storage of the header is quite memory consuming. Two solutions: when you are certain a certain message is not needed anymore, you can destruct() it. When you are certain you do not need a folder anymore, you can close() it. In both cases, memory should get freed up. However, selecting messages to destruct or folders to close may be an algorithmic challange. Sorry for the slow response.
From: bram [...] cs.queensu.ca
Hi, On Thu Jun 03 03:44:10 2010, MARKOV wrote: Show quoted text
> Two solutions: when you are certain a certain message is not needed > anymore, you can destruct() it. When you are certain you do not need a > folder anymore, you can close() it. In both cases, memory should get > freed up. However, selecting messages to destruct or folders to close > may be an algorithmic challange.
OK, this seems to work. Thanks! Bram Adams
not a memory leak