Skip Menu |

This queue is for tickets about the MailTools CPAN distribution.

Report information
The Basics
Id: 1256
Status: resolved
Priority: 0/
Queue: MailTools

People
Owner: MARKOV [...] cpan.org
Requestors: mgrommet [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.47
Fixed in: (no value)



Subject: MailTools having trouble parsing certain messages
I've been using MailTools for quite some time with great success... I have constructed a mail parser for a client that strips attachments and scans for particular words and phrases... all has worked fine forever, until today when a couple of messages came into the system that were unparsable for some reason... I know for a fact I need to tweak my debugging, but you might want to evaluate your parsing algorithms to see what funkiness might be going on here... perhaps its something you want to handle, perhaps not. If I do a print_header on the mailObj, I get this output: Received: by 63.226.106.1 with Microsoft Outlook Express 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 -0700 Message-Id: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> Date: Tue, 2 Jul 2002 18:36:33 -0700 Here is the mail (yeah, its spam, but its an example) ---- snip ---- Received: by 63.226.106.1 with Microsoft Outlook Express 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 -0700 Message-ID: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> Date: Tue, 2 Jul 2002 18:36:33 -0700 X Mailer: Microsoft Outlook Express 5.50.4522.1200 From: "Monica Jay" <monicajay@earthlink.net> To: a-zfinacial@sigmarep.com Subject: just check it out MIME-Version: 1.0 Content-Type: multipart/alternative;boundary="----= NextPart 000 0008 01C1FDE6.BC3E7600" X-RAVMilter-Version: 8.3.0(snapshot 20010925) (sigmamail.elucidations.net) Teen Portal presents: .............................................. For Those About The TEENS. Find exactly what you want to. For Free... If you recieved this email in error, please visit our Don't Email Me page TEENS_TEENS _TEENS_ ARE READY TO GET D_I_R_T_Y N_A_S_T_Y FOR FREE! DON'T MISS OUT ON THE BEST TEENS VIDEO & PICS FREE SITES & GALLERIES COLLECTION! PICS GALLERIES > MOVIES > LIVE FEEDS > ~MAKE FRIENDS~ FREE A_D_U_L_T CONTENT! TONS OF PICS. NO CREDIT CARD REQUIRED. TEEN PORTAL WIIL HELP YOU TO FIND EXACTLY WHAT YOU WANT FROM 1000'S _A_D_U_L_T_ SITES ON THE NET! WARNING! This site contains _s_e_x_u_a_l_l_y_ oriented _a_d_u_l_t_ material intended for individuals 18 years of age or older and of legal age to view sexually explicit material as determined by the local and national laws of the region in which you reside. If you are not yet 18, if _a_d_u_l_t_ material offends you, or if you are from any country where _a_d_u_l_t_ material is specifically prohibited by law, do not enter this site and remove this message. If you recieved this email in error, please visit our Don't Email Me page. Your email address will be immediately removed from the mailing list. ------------------------- ** An attachment named < msg-30676-2.html > was removed.
From: Mike Grommet
More information on this: Apparently its having trouble parsing the Recieved: part of the mail header... if I take an example (such as the one below) and combine the Recieved Header back to one line, everything seems to parse fine... Using unfold on the mail object seems to have no affect on this. [guest - Mon Jul 8 18:05:43 2002]: Show quoted text
> I've been using MailTools for quite some time with great success... I > have constructed a mail parser for a client that strips attachments > and scans for particular words and phrases... all has worked fine > forever, until today when a couple of messages came into the system > that were unparsable for some reason... > > I know for a fact I need to tweak my debugging, but you might want to > evaluate your parsing algorithms to see what funkiness might be > going on here... perhaps its something you want to handle, perhaps > not. > > If I do a print_header on the mailObj, I get this output: > Received: by 63.226.106.1 with Microsoft Outlook Express > 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 > -0700 > Message-Id: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> > Date: Tue, 2 Jul 2002 18:36:33 -0700 > > > > Here is the mail (yeah, its spam, but its an example) > ---- snip ---- > Received: by 63.226.106.1 with Microsoft Outlook Express > 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 > -0700 > Message-ID: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> > Date: Tue, 2 Jul 2002 18:36:33 -0700 > X Mailer: Microsoft Outlook Express 5.50.4522.1200 > From: "Monica Jay" <monicajay@earthlink.net> > To: a-zfinacial@sigmarep.com > Subject: just check it out > MIME-Version: 1.0 > Content-Type: multipart/alternative;boundary="----= NextPart 000 0008 > 01C1FDE6.BC3E7600" > X-RAVMilter-Version: 8.3.0(snapshot 20010925) > (sigmamail.elucidations.net) > > Teen Portal presents: > .............................................. > For Those About The TEENS. > Find exactly what you want to. For Free... > If you recieved this email in error, please visit our Don't Email Me > page > TEENS_TEENS _TEENS_ > ARE READY TO GET D_I_R_T_Y N_A_S_T_Y FOR FREE! DON'T MISS OUT > ON THE BEST TEENS VIDEO & PICS FREE SITES & GALLERIES COLLECTION! > PICS GALLERIES > MOVIES > LIVE FEEDS > ~MAKE FRIENDS~ > FREE A_D_U_L_T CONTENT! TONS OF PICS. NO CREDIT CARD REQUIRED. > TEEN PORTAL WIIL HELP YOU TO FIND EXACTLY WHAT YOU WANT FROM 1000'S > _A_D_U_L_T_ SITES ON THE NET! > WARNING! This site contains _s_e_x_u_a_l_l_y_ oriented > _a_d_u_l_t_ material intended for individuals 18 years of age or > older and of legal age to view sexually explicit material as > determined by the local and national laws of the region in which > you reside. If you are not yet 18, if _a_d_u_l_t_ material offends > you, or if you are from any country where _a_d_u_l_t_ material is > specifically prohibited by law, do not enter this site and remove > this message. > > > If you recieved this email in error, please visit our Don't Email Me > page. > Your email address will be immediately removed from the mailing list. > ------------------------- > > > > > > ** An attachment named < msg-30676-2.html > was removed.
[guest - Fri Jul 26 01:58:28 2002]: Show quoted text
> > If I do a print_header on the mailObj, I get this output: > > Received: by 63.226.106.1 with Microsoft Outlook Express > > 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 > > -0700 > > Message-Id: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> > > Date: Tue, 2 Jul 2002 18:36:33 -0700
Show quoted text
> > Here is the mail (yeah, its spam, but its an example) > > ---- snip ---- > > Received: by 63.226.106.1 with Microsoft Outlook Express > > 5.50.4522.1200id <P167413608064HER24>; Tue, 2 Jul 2002 18:36:33 > > -0700 > > Message-ID: <E167413608064AR3ZPZXH0DA1YOJ@63.226.106.1> > > Date: Tue, 2 Jul 2002 18:36:33 -0700 > > X Mailer: Microsoft Outlook Express 5.50.4522.1200 > > [SNIP]
As you can see, the parser stopped at the line: 'X Mailer' The parser is right, because this is not a valid header line: there must be a dash '-' i.s.o. the blank: X-Mailer Software can handle this error two ways: or ignore the erroneous lines and continue until a blank line as end-of-header, or take the first failing header line as start for the body of the message. MailTools (and Mail::Box for the sake) implement the second approach, which at least does not throw away information when the message is printed as a whole.... -- MarkOv mailtools@overmeer.net