Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 5817
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: chris-mb [...] syntacticsugar.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.054
Fixed in: (no value)



Subject: Possible bad parsing of mbox file
I'm using Mail::Box in an application involving the processing of Linux Kernel mbox archives. The mbox file in question that's causing this problem is ftp://ftp.uwsg.iu.edu/pub/mail.archive/kernel/kernel.0401.1 I load the file into Mail::Box with the following: ... my $folder = $box->open(folder => $file, type => 'Mail::Box::Mbox'); push @all_messages, $folder->messages(); # Add all the messages. .. And then I use Mail::Thread, and then run: my $threader = new Mail::Thread(@all_messages); $threader->thread; This is the point at which, while processing the file, I receive this warning: WARNING: Illegal character in field name From linux-kernel-owner+lnxkrnl-gauley.ucs.indiana.edu@vger.kernel.org Thu Jan 15 23 This is after being suspicious that the original "=40" was responsible, and I ran sed to replace these instances with "@". But no, this is the same From line that distinguishes the bottom of one message from the top of the next, and not a header. It's only happening with this one file; other mboxes in the same directory don't have a problem. Similar lines in other files run smoothly. More important than the warning, memory usage spikes. The process used 1.35Gb of memory before successfully running through the script. (Luckily this machine has 9Gb of swap.) The output was not significantly affected by the warning. The machine is a dual-processor Opteron, and is thus a 64-bit architecture. I believe this may be relevant because the same warning does NOT seem to occur on my 32-bit P3 laptop. I've verified that the files are identical. Any ideas?
[guest - Fri Mar 26 18:32:39 2004]: Oh, more details, in case it's relevant: perl -v says: This is perl, v5.8.1 built for x86_64-linux-thread-multi The OS is SuSE Professional 9.0.
The error cannot be reproduced. On my P4, with all current versions of modules, this only consumes 36MB during the run. The warning is produces when the "From ..." line is in the middle of a header. This is not the case, what I have checked manually. I must warn you that Mail::Thread does not implement what it says: it is only a part of the JWZ threading algoritmn. Mail::Box also has thread-detection, however only based on message-ids. my $f = $mgr->open("folder", extract => 'LAZY'); my $t = $mgr->threads($f); my @threads = $t->all;
Show quoted text
> The warning is produces when the "From ..." line is in the middle of a > header. This is not the case, what I have checked manually.
Curious. I wonder why this is happening, then. Show quoted text
> I must warn you that Mail::Thread does not implement what it says: it is > only a part of the JWZ threading algoritmn. Mail::Box also has > thread-detection, however only based on message-ids.
Well, if not, it does a good enough approximation of it. JWZ's linked to it.
You may as well have hit some other bug, running on such a "new" machine. That may be related to the size of your folder, and a wild variety of problems in the OS, GCC or Perl. Try to reduce the size of the folder, to figure-out whether the problem stays. If you have limited the folder's size below 5 messages and there still is a reproducable problem, I will continue my investigation.