Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 40353
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: fschlich [...] cis.fu-berlin.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.084
Fixed in: (no value)



Subject: $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
When processing incoming mail, header fields may contain all kinds of non-standard content. I'd like to see Mail::Box "fall on its feet" and "do what's sensible" instead of throwing an exception and forcing me to work around it in my own code. I've attached two messages (mboxes), one without subject at all, one with =?UNKNOWN?Q? specified as encoding. I'd usually want to do something like this: my $subject = $msg->study('subject')->decodedBody(); while ($subject =~ /\[(\d{2,6})\]/g) { # do sth } in the case of the no-subject message, this fails with [17:15] newsadm@Trinidad:~/tmp/fschlich/mail__box/subjectcrash$ ../subjectcrash.pl subjectcrash.offender.nosubject.mbox DEBUG: working on 1197630533.476264459acec@imp3.online.net Can't call method "decodedBody" on an undefined value at ../subjectcrash.pl line 52, <GEN3> line 74 (#1) (F) You used the syntax of a method call, but the slot filled by the object reference or package name contains an undefined value. Something like this will reproduce the error: $BADREF = undef; process $BADREF 1,2,3; $BADREF->process(1,2,3); Uncaught exception from user code: Can't call method "decodedBody" on an undefined value at ../subjectcrash.pl line 52, <GEN3> line 74. in the second case, the error message is [17:15] newsadm@Trinidad:~/tmp/fschlich/mail__box/subjectcrash$ ../subjectcrash.pl subjectcrash.offender.UNKNOWNenc.mbox DEBUG: working on q6t2g65i8rc3cmt78p21556q5@glay.org Uncaught exception from user code: Unknown encoding 'UNKNOWN' at /usr/share/perl5/Mail/Message/Field/Full.pm line 282 Encode::decode('UNKNOWN', 'n\x{e4}chster ', 0) called at /usr/share/perl5/Mail/Message/Field/Full.pm line 282 Mail::Message::Field::Full::_decoder('UNKNOWN', 'Q', 'n=E4chster_') called at /usr/share/perl5/Mail/Message/Field/Full.pm line 292 Mail::Message::Field::Full::decode('Mail::Message::Field::Unstructured=HASH(0x89547d8)', 'Fwd: Dein =?UNKNOWN?Q?n=E4chster?= Fick wartet schon auf Dich...') called at /usr/share/perl5/Mail/Message/Field/Full.pm line 148 Mail::Message::Field::Full::decodedBody('Mail::Message::Field::Unstructured=HASH(0x89547d8)') called at ../subjectcrash.pl line 52 in the first case, it helps to just not call decodedBody(), as an undef $subject will behave as expected in the regex. In the second case it doesn't make a difference, as decodedBody() needs to be called for stringification. My workaround is to put the call to decodedBody in an eval block, but I wonder a) if $msg->study('subject') could return an empty subject (stringifies to '') instead of undef if there's no Subject: header; b) if Mail::Message::Field::Full::decodedBody could fail gracefully and return the undecoded body instead of throwing an exception (this may be a problem in Encode, but you seem to check for unknown encodings already?) attached the two messages, and parts of my code. NB this is perl 5.8.4, latest Mail::Box but a fairly old Encode (1.99) -- sorry I don't have the time to test the current Encode-2.26 now, will do next week, but perhaps you have some thoughts on this already? Florian
Subject: subjectcrash.pl
#!/usr/bin/perl -w use strict; use diagnostics; # DEBUG-LIBs #use lib '/server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4/'; use Mail::Box::Manager; my $debug = 1; $|=1 if $debug; my $seen_file = 'seen.mbox'; (-f $seen_file) || `touch $seen_file`; my $new_file = $ARGV[0]; (-e $new_file) || die "kann $new_file nicht finden. Wirklich eine mbox? $!\n"; # MAIL::BOX to the rescue my $mgr = Mail::Box::Manager->new; my $seen_folder = $mgr->open(folder => $seen_file, access => 'rw') or die "cannot open folder $seen_file: $!\n"; my $new_folder = $mgr->open(folder => $new_file, access => 'rw', extract => 'ALWAYS', cache_body => 'DELAY', cache_head => 'DELAY') or die "cannot open folder $new_file: $!\n"; my $threads = $mgr->threads(folders => [$seen_folder,$new_folder]); # look at all threads in turn foreach my $startnode ($threads->all) { my @thread = $startnode->threadMessages; my %uids = (); # walk along thread looking for tags in subject header while (my $msg = pop @thread) { next if $msg->isDeleted or $msg->isDummy; print 'DEBUG: working on ', $msg->messageId, "\n" if $debug; # get pure UTF-8 body, no ?iso-88..? etc left my $subject = $msg->study('subject'); # this would be a workaround # eval { # stirbt bei unbekanntem encoding... # $subject = $subject->decodedBody(); # ?iso-88..? im subject... # }; # if ($@) { # print 'ERROR: ', $@; # $subject = 0; # } # #$subject = $subject->decodedBody(); # OHNE DIESE ZEILE GEHT'S... undef muss nicht stringifiziert! # if ($subject) { # Falls decoding-error oder Mail ohne Subject while ($subject =~ /\[(\d{2,6})\]/g) { next if ($1 < 23 || $1 > 399999); $uids{$1} = 1; print "DEBUG: found user [$1]\n" if $debug; } # } last if scalar keys %uids; } # process results }
Subject: subjectcrash.offender.UNKNOWNenc.mbox
Download subjectcrash.offender.UNKNOWNenc.mbox
application/octet-stream 3.4k

Message body not shown because it is not plain text.

Subject: subjectcrash.offender.nosubject.mbox
Download subjectcrash.offender.nosubject.mbox
application/octet-stream 3.1k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #40353] AutoReply: $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
Date: Mon, 27 Oct 2008 14:37:01 +0100
To: Bugs in Mail-Box via RT <bug-Mail-Box [...] rt.cpan.org>
From: Florian Schlichting <fschlich [...] CIS.FU-Berlin.DE>
Show quoted text
> NB this is perl 5.8.4, latest Mail::Box but a fairly old Encode (1.99) > -- sorry I don't have the time to test the current Encode-2.26 now, will > do next week, but perhaps you have some thoughts on this already?
I've just tested current Encode-2.26, with identical results. Encode::decode seems to croak on purpose, so I'd suggest Mail::Box deal with the exception and just return the undecoded string. Florian
Subject: Re: [rt.cpan.org #40353] $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
Date: Fri, 7 Nov 2008 12:13:55 +0100
To: Florian via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Florian via RT (bug-Mail-Box@rt.cpan.org) [081024 16:38]: Show quoted text
> Fri Oct 24 11:38:14 2008: Request 40353 was acted upon. > Transaction: Ticket created by fsfs > Queue: Mail-Box > Subject: $subject->decodedBody(): fail more gracefully on non-existant or > weirdly encoded subjects
Two different things, two seperate answers... Show quoted text
> my $subject = $msg->study('subject')->decodedBody(); > while ($subject =~ /\[(\d{2,6})\]/g) { > # do sth > } > > in the case of the no-subject message, this fails with > DEBUG: working on 1197630533.476264459acec@imp3.online.net
Above problem is very common in OO code. It often blocks the nice application of stacking calls. There must be a distiction between a missing field and an existing field (may be empty). So I could produce a special "::Field::Missing" object. What should that ->decodedBody produce to still see the difference? I think that your code should be: my $subject; if(my $s = $msg->study('subject')) { $subject = $s->decodedBody; } This also avoids the need to dozens of "::Missing" object classes, and still is simple. Alternatively: if(my $s = $msg->get('subject')) { $subject = $s->study->decodedBody; } -- MarkOv
Subject: Re: [rt.cpan.org #40353] $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
Date: Fri, 7 Nov 2008 12:38:52 +0100
To: Florian via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Florian via RT (bug-Mail-Box@rt.cpan.org) [081024 16:38]: Show quoted text
> Fri Oct 24 11:38:14 2008: Request 40353 was acted upon. > Queue: Mail-Box > Subject: $subject->decodedBody(): fail more gracefully on non-existant or > weirdly encoded subjects
Show quoted text
> in the second case, the error message is > Uncaught exception from user code: > Unknown encoding 'UNKNOWN' at
Show quoted text
> 'Fwd: Dein =?UNKNOWN?Q?n=E4chster?= Fick wartet schon auf Dich...') > called at /usr/share/perl5/Mail/Message/Field/Full.pm line 148
Show quoted text
> b) if Mail::Message::Field::Full::decodedBody could fail gracefully and > return the undecoded body instead of throwing an exception (this may be > a problem in Encode, but you seem to check for unknown encodings already?)
I do agree with you: this must be handled more cleanly. Presumably, it is best to return the non-decoded string in this case. In Mail/Message/Field/Full.pm I have replaced the _decoder/decode subroutines by this: sub _decoder($$$$) { my ($charset, $encoding, $encoded, $whole) = @_; $charset =~ s/\*[^*]+$//; # string language, not used my $to_utf8 = Encode::find_encoding($charset || 'us-ascii') or return $whole; my $decoded; if(lc($encoding) eq 'q') { # Quoted-printable encoded $encoded =~ s/_/ /g; $decoded = MIME::QuotedPrint::decode_qp($encoded); } elsif(lc($encoding) eq 'b') { # Base64 encoded require MIME::Base64; $decoded = MIME::Base64::decode_base64($encoded); } else { # unknown encodings ignored return $whole; } $to_utf8->($decoded, Encode::FB_DEFAULT); # error-chars -> '?' } sub decode($@) { my ($self, $encoded, %args) = @_; if(defined $args{is_text} ? $args{is_text} : 1) { # in text, blanks between encoding must be removed, but otherwise kept :( # dirty trick to get this done: add an explicit blank. $encoded =~ s/\?\=\s(?!\s*\=\?|$)/_?= /gs; } $encoded =~ s/\=\?([^?\s]*)\?([^?\s]*)\?([^?\s]*)\?\=\s*/ _decoder($1,$2,$3,$encoded)/gse; $encoded; } -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #40353] $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
Date: Fri, 14 Nov 2008 12:33:41 +0100
To: Mark Overmeer via RT <bug-Mail-Box [...] rt.cpan.org>
From: Florian Schlichting <fschlich [...] CIS.FU-Berlin.DE>
Show quoted text
> > my $subject = $msg->study('subject')->decodedBody();
Show quoted text
> > Can't call method "decodedBody" on an undefined value at > > ../subjectcrash.pl line 47
> > Above problem is very common in OO code. It often blocks the > nice application of stacking calls. > > There must be a distiction between a missing field and an existing > field (may be empty). So I could produce a special "::Field::Missing" > object. What should that ->decodedBody produce to still see the > difference? > > I think that your code should be: > > my $subject; > if(my $s = $msg->study('subject')) > { $subject = $s->decodedBody; > }
I actually had something like that in there when I still used $msg->subject, but then needed decoding and thus found $msg->study('subject')->decodedBody() If you'd still need to see the difference between empty and absent header after calling ->decodedBody ($msg->subject returns an empty string, not undef, it seems), this issue is probably best resolved in the documentation (which sounds a lot like "you get an object no matter what"), ie mentioning that $msg->study (and probably ->get as well?) might also return undef if the field does not exist: --- Message.pod.org 2008-11-14 12:19:00.059968512 +0100 +++ Message.pod 2008-11-14 12:26:56.088152188 +0100 @@ -671,7 +671,8 @@ Study the content of a field, like L<get()|Mail::Message/"The header"> does, with as main difference -that a L<Mail::Message::Field::Full|Mail::Message::Field::Full> object is returned. These objects +that a L<Mail::Message::Field::Full|Mail::Message::Field::Full> object is returned, unless the +specified header field doesn't exists and undef is returned. These objects stringify to an utf8 decoded representation of the data contained in the field, where L<get()|Mail::Message/"The header"> does not decode. See L<Mail::Message::Field::study()|Mail::Message::Field/"Access to the content">. or perhaps better: --- Message.pod.org 2008-11-14 12:19:00.059968512 +0100 +++ Message.pod 2008-11-14 12:32:25.046632105 +0100 @@ -676,6 +676,8 @@ the field, where L<get()|Mail::Message/"The header"> does not decode. See L<Mail::Message::Field::study()|Mail::Message::Field/"Access to the content">. +If the field to be studied does not exist, the undef value is returned. + example: the study() short-cut for header fields print $msg->study('to'), "\n" ...and equivalent for get() Florian
Subject: Re: [rt.cpan.org #40353] $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects]
Date: Fri, 14 Nov 2008 13:24:50 +0100
To: bug-Mail-Box [...] rt.cpan.org
From: Florian Schlichting <fschlich [...] CIS.FU-Berlin.DE>
hrmpf, cpan rejected my mail due to the spammy nature of the attachment... hope this one will get thru ----- Forwarded message from Florian Schlichting <fschlich@CIS.FU-Berlin.DE> ----- Show quoted text
> > Uncaught exception from user code: > > Unknown encoding 'UNKNOWN' at
>
> > 'Fwd: Dein =?UNKNOWN?Q?n=E4chster?= Fick wartet schon auf Dich...') > > called at /usr/share/perl5/Mail/Message/Field/Full.pm line 148
Show quoted text
> In Mail/Message/Field/Full.pm I have replaced the _decoder/decode > subroutines by this:
this seems to do the appropriate check; yet have you tried printing the subject of the offending mail? Doing something like my $subject = $msg->study('subject'); if ($subject) { $subject = $subject->decodedBody(); print "decoded subj is $subject###\n"; } what I get is decoded subj is Fwd: Dein Fwd: Dein =?UNKNOWN?Q?n=E4chster_?= Fick wartet schon auf Dich!Date: Wed, 15 Aug 2007 06:56:54 GMTFick wartet schon auf Dich!Date: Wed, 15 Aug 2007 06:56:54 GMT### ie, I get the subject twice, with a copy of the original subject substituted in the place where the offending encoding was -- bug? Also, the attached mbox produces DEBUG: working on ce6e4b3a1f966f99301bcaa490047713@localhost.localdomain Not a CODE reference at /server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4//Mail/Message/Field/Full.pm line 283, <GEN3> line 92 (#1) (F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference to something else instead. You can use the ref() function to find out what kind of ref it really was. See also perlref. Uncaught exception from user code: Not a CODE reference at /server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4//Mail/Message/Field/Full.pm line 283, <GEN3> line 92. at /server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4//Mail/Message/Field/Full.pm line 283 Mail::Message::Field::Full::_decoder('ISO-2022-JP', 'B', 'GyRCQC44eU4oIzEjMyFzJE80MEE0Sl0+WiQ3JF4kOSEjGyhC', '=?ISO-2022-JP?B?GyRCQC44eU4oIzEjMyFzJE80MEE0Sl0+WiQ3JF4kOSEjG...') called at /server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4//Mail/Message/Field/Full.pm line 293 Mail::Message::Field::Full::decode('Mail::Message::Field::Unstructured=HASH(0x8a000b8)', '=?ISO-2022-JP?B?GyRCQC44eU4oIzEjMyFzJE80MEE0Sl0+WiQ3JF4kOSEjG...') called at /server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4//Mail/Message/Field/Full.pm line 148 Mail::Message::Field::Full::decodedBody('Mail::Message::Field::Unstructured=HASH(0x8a000b8)') called at ./showsubj.pl line 48 NB: I'll be on holiday for two weeks now, can do more testing afterwards. Florian Show quoted text
----- End forwarded message -----
Download jp-spam.gz
application/x-gunzip 1.5k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #40353] $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects
Date: Sat, 22 Nov 2008 23:50:12 +0100
To: Florian via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Florian via RT (bug-Mail-Box@rt.cpan.org) [081114 11:34]: Show quoted text
> Queue: Mail-Box > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=40353 > > > If you'd still need to see the difference between empty and absent > header after calling ->decodedBody ($msg->subject returns an empty > string, not undef, it seems),
Document alterations accepted. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
released in 2.085