Subject: | $subject->decodedBody(): fail more gracefully on non-existant or weirdly encoded subjects |
When processing incoming mail, header fields may contain all kinds of
non-standard content. I'd like to see Mail::Box "fall on its feet" and
"do what's sensible" instead of throwing an exception and forcing me to
work around it in my own code.
I've attached two messages (mboxes), one without subject at all, one
with =?UNKNOWN?Q? specified as encoding. I'd usually want to do
something like this:
my $subject = $msg->study('subject')->decodedBody();
while ($subject =~ /\[(\d{2,6})\]/g) {
# do sth
}
in the case of the no-subject message, this fails with
[17:15] newsadm@Trinidad:~/tmp/fschlich/mail__box/subjectcrash$
../subjectcrash.pl subjectcrash.offender.nosubject.mbox
DEBUG: working on 1197630533.476264459acec@imp3.online.net
Can't call method "decodedBody" on an undefined value at ../subjectcrash.pl
line 52, <GEN3> line 74 (#1)
(F) You used the syntax of a method call, but the slot filled by the
object reference or package name contains an undefined value. Something
like this will reproduce the error:
$BADREF = undef;
process $BADREF 1,2,3;
$BADREF->process(1,2,3);
Uncaught exception from user code:
Can't call method "decodedBody" on an undefined value at
../subjectcrash.pl line 52, <GEN3> line 74.
in the second case, the error message is
[17:15] newsadm@Trinidad:~/tmp/fschlich/mail__box/subjectcrash$
../subjectcrash.pl subjectcrash.offender.UNKNOWNenc.mbox
DEBUG: working on q6t2g65i8rc3cmt78p21556q5@glay.org
Uncaught exception from user code:
Unknown encoding 'UNKNOWN' at
/usr/share/perl5/Mail/Message/Field/Full.pm line 282
Encode::decode('UNKNOWN', 'n\x{e4}chster ', 0) called at
/usr/share/perl5/Mail/Message/Field/Full.pm line 282
Mail::Message::Field::Full::_decoder('UNKNOWN', 'Q',
'n=E4chster_') called at /usr/share/perl5/Mail/Message/Field/Full.pm
line 292
Mail::Message::Field::Full::decode('Mail::Message::Field::Unstructured=HASH(0x89547d8)',
'Fwd: Dein =?UNKNOWN?Q?n=E4chster?= Fick wartet schon auf Dich...')
called at /usr/share/perl5/Mail/Message/Field/Full.pm line 148
Mail::Message::Field::Full::decodedBody('Mail::Message::Field::Unstructured=HASH(0x89547d8)')
called at ../subjectcrash.pl line 52
in the first case, it helps to just not call decodedBody(), as an undef
$subject will behave as expected in the regex. In the second case it
doesn't make a difference, as decodedBody() needs to be called for
stringification. My workaround is to put the call to decodedBody in an
eval block, but I wonder
a) if $msg->study('subject') could return an empty subject (stringifies
to '') instead of undef if there's no Subject: header;
b) if Mail::Message::Field::Full::decodedBody could fail gracefully and
return the undecoded body instead of throwing an exception (this may be
a problem in Encode, but you seem to check for unknown encodings already?)
attached the two messages, and parts of my code.
NB this is perl 5.8.4, latest Mail::Box but a fairly old Encode (1.99)
-- sorry I don't have the time to test the current Encode-2.26 now, will
do next week, but perhaps you have some thoughts on this already?
Florian
Subject: | subjectcrash.pl |
#!/usr/bin/perl -w
use strict;
use diagnostics;
# DEBUG-LIBs
#use lib '/server/newsadm/tmp/fschlich/mail__box/perl/share/perl/5.8.4/';
use Mail::Box::Manager;
my $debug = 1;
$|=1 if $debug;
my $seen_file = 'seen.mbox';
(-f $seen_file) || `touch $seen_file`;
my $new_file = $ARGV[0];
(-e $new_file) || die "kann $new_file nicht finden. Wirklich eine mbox? $!\n";
# MAIL::BOX to the rescue
my $mgr = Mail::Box::Manager->new;
my $seen_folder = $mgr->open(folder => $seen_file, access => 'rw') or die "cannot open folder $seen_file: $!\n";
my $new_folder = $mgr->open(folder => $new_file, access => 'rw', extract => 'ALWAYS', cache_body => 'DELAY', cache_head => 'DELAY') or die "cannot open folder $new_file: $!\n";
my $threads = $mgr->threads(folders => [$seen_folder,$new_folder]);
# look at all threads in turn
foreach my $startnode ($threads->all) {
my @thread = $startnode->threadMessages;
my %uids = ();
# walk along thread looking for tags in subject header
while (my $msg = pop @thread) {
next if $msg->isDeleted or $msg->isDummy;
print 'DEBUG: working on ', $msg->messageId, "\n" if $debug;
# get pure UTF-8 body, no ?iso-88..? etc left
my $subject = $msg->study('subject');
# this would be a workaround
# eval { # stirbt bei unbekanntem encoding...
# $subject = $subject->decodedBody(); # ?iso-88..? im subject...
# };
# if ($@) {
# print 'ERROR: ', $@;
# $subject = 0;
# }
#
#$subject = $subject->decodedBody(); # OHNE DIESE ZEILE GEHT'S... undef muss nicht stringifiziert!
# if ($subject) { # Falls decoding-error oder Mail ohne Subject
while ($subject =~ /\[(\d{2,6})\]/g) {
next if ($1 < 23 || $1 > 399999);
$uids{$1} = 1;
print "DEBUG: found user [$1]\n" if $debug;
}
# }
last if scalar keys %uids;
}
# process results
}
Subject: | subjectcrash.offender.UNKNOWNenc.mbox |
Message body not shown because it is not plain text.
Subject: | subjectcrash.offender.nosubject.mbox |
Message body not shown because it is not plain text.