Subject: | Possible bad parsing of mbox file |
I'm using Mail::Box in an application involving the processing of Linux Kernel mbox archives. The mbox file in question that's causing this problem is ftp://ftp.uwsg.iu.edu/pub/mail.archive/kernel/kernel.0401.1
I load the file into Mail::Box with the following:
...
my $folder = $box->open(folder => $file, type => 'Mail::Box::Mbox');
push @all_messages, $folder->messages(); # Add all the messages.
..
And then I use Mail::Thread, and then run:
my $threader = new Mail::Thread(@all_messages);
$threader->thread;
This is the point at which, while processing the file, I receive this warning:
WARNING: Illegal character in field name From linux-kernel-owner+lnxkrnl-gauley.ucs.indiana.edu@vger.kernel.org Thu Jan 15 23
This is after being suspicious that the original "=40" was responsible, and I ran sed to replace these instances with "@". But no, this is the same From line that distinguishes the bottom of one message from the top of the next, and not a header. It's only happening with this one file; other mboxes in the same directory don't have a problem. Similar lines in other files run smoothly.
More important than the warning, memory usage spikes. The process used 1.35Gb of memory before successfully running through the script. (Luckily this machine has 9Gb of swap.) The output was not significantly affected by the warning. The machine is a dual-processor Opteron, and is thus a 64-bit architecture. I believe this may be relevant because the same warning does NOT seem to occur on my 32-bit P3 laptop. I've verified that the files are identical.
Any ideas?