Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 9099
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: chris [...] improbable.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 2.055
  • 2.059
Fixed in: (no value)



Subject: Parser has trouble with Mbox messages which contain lines starting with 'From '
I'm using perl v5.8.4 on Debian sarge. This problem happens with both the 2.055 version in Debian's libmail-box-perl, which notably does not include the C parser, and a fresh 2.059 install from CPAN. I've been using Mail::Box to convert mbox archives to Maildirs as part of our IMAP migration process. Some users have messages where the body contains a line which starts with 'From ' and ends with ', year-like-number'; these messages will be misparsed and seen as two messages - one with the real headers and the body up to the line before the From and a second message containing the remainder of the body and no headers. Here's an example message which will be parsed as two messages: -------------------------------------------------------------- From announcements@example.org Thu Apr 4 05:50:22 2002 Return-Path: <announcements@example.org> From: Announcements <announcements@example.org> To: Someone <someone@example.edu> Subject: some message subject Date: Thu, 4 Apr 2002 08:48:28 -0500 From something, 2002: -------------------------------------------------------------- If that year exceeds a reasonable range (>=3000) the message will be correctly treated as a single message: -------------------------------------------------------------- From announcements@example.org Thu Apr 4 05:50:22 2002 Return-Path: <announcements@example.org> From: Announcements <announcements@example.org> To: Someone <someone@example.edu> Subject: some message subject Date: Thu, 4 Apr 2002 08:48:28 -0500 From something, 3002: -------------------------------------------------------------- One defense against this might be sanity-checking against Content-Length or Lines headers - the last message which I encountered had both.
This is the ever-lasting problem of the message separators in mbox files. Somewhere (in some RFC, no wish to look it up), the suggestion is to escape all body lines which start '>*From' by prepending a '>'. By far most mail-readers do nicely follow that convention, as does MailBox. However, some applictions (f.i. mutt) do not. Therefore, MailBox has as parser rule that separators do not only start with "From" but need a normal year in them as well. Mail::Box::Parser::Perl uses $sep ne 'From ' || $line =~ m/ (19[789]|20[01])\d\b/ so 1970-2019. That's why 3002 works and 2002 doesn't. My simple answer: your folder is corrupt because 'From' lines in the body are not escaped.