Subject: | [PATCH] rewrite of fetch_hash to resolve several issues |
fetch_hash() has a number of bugs that make it difficult to use for
anything but the most trivial of uses:
- it searches for the UID attribute in every result line. Many MTAs add
the string "uid 510" or similar to the Received header to identify the
unix user that sent the message. This matches and causes spurious
entries in the result hash
- it does not understand that BODY attributes will be returned in
response to BODY.PEEK requests
- the attribute value parser does not handle nested parentheses, making
it impossible to request the BODYSTRUCTURE attribute
The attached patch rewrites much of the internals, making it do the
right thing in these cases.
A cursory glance seems to show that it will produce the same results for
calls that worked previously. The fact that its been broken for so long
suggests to me that it has rarely been used, so hopefully there won't be
any particular problem changing it (especially given the API break in
the change to 3.0).
I plan at some point to break out the response parsing bits into a
seperate function so that it can be tested.
Subject: | fetch-hash-rewrite.diff |
diff --git a/lib/Mail/IMAPClient.pm b/lib/Mail/IMAPClient.pm
index bda613d..b543622 100644
--- a/lib/Mail/IMAPClient.pm
+++ b/lib/Mail/IMAPClient.pm
@@ -2013,57 +2013,90 @@ sub fetch_hash {
for (@words) {
s/([\( ])FAST([\) ])/${1}FLAGS INTERNALDATE RFC822\.SIZE$2/i;
-s/([\( ])FULL([\) ])/${1}FLAGS INTERNALDATE RFC822\.SIZE ENVELOPE BODY$2/i;
+ s/([\( ])FULL([\) ])/${1}FLAGS INTERNALDATE RFC822\.SIZE ENVELOPE BODY$2/i;
}
+ my %words = map { uc($_) => 1 } @words;
my $output = $self->fetch( $msgs, "($what)" ) or return undef;
- for ( my $x = 0 ; $x <= $#$output ; $x++ ) {
+ while (my $l = shift @$output) {
+ $l =~ m/^\* (\d+) FETCH \(/g; my $mid = $1 || undef;
+ $mid or next;
+
my $entry = {};
- my $l = $output->[$x];
- if ( $self->Uid ) {
- my $uid = $l =~ /\bUID\s+(\d+)/i ? $1 : undef;
- $uid or next;
+ my ($key, $value);
+ ATTR: while ($l !~ m/\G\s*\)\s*$/gc) {
+ if ($l =~ m/\G\s*([\w\d\.]+(?:\[[^\]]+\])?)\s*/gc) {
+ $key = uc($1);
+ }
+ elsif (!defined $key) {
+ # some kind of malformed response
+ $self->LastError("Invaliid item name in FETCH response: $l");
+ return undef;
+ }
+
+ if ($l =~ m/\G\s*$/gc) {
+ $value = shift @$output;
+ $entry->{$key} = $value;
+ $l = shift @$output;
+ next ATTR;
+ }
+
+ elsif ($l =~ m/\G(?:"([^"]+)"|([^()\s]+))\s*/gc) {
+ $value = defined $1 ? $1 : $2;
+ $entry->{$key} = $value;
+ next ATTR;
+ }
+
+ elsif ($l =~ m/\G\(/gc) {
+ my $depth = 1;
+ $value = "";
+ while ($l =~ m/\G(\(|\)|[^()]+)/gc) {
+ my $stuff = $1;
+ if ($stuff eq "(") {
+ $depth++;
+ $value .= "(";
+ }
+ elsif ($stuff eq ")") {
+ $depth--;
+ if ($depth == 0) {
+ $entry->{$key} = $value;
+ next ATTR;
+ }
+ $value .= ")";
+ }
+ else {
+ $value .= $stuff;
+ }
+ }
+ m/\G\s*/gc;
+ }
+
+ else {
+ $self->LastError("Invalid item value in FETCH response: $l");
+ return undef;
+ }
+ }
- if ( $uids->{$uid} ) { $entry = $uids->{$uid} }
- else { $uids->{$uid} ||= $entry }
+ if ($self->Uid) {
+ $uids->{$entry->{UID}} = $entry;
}
else {
- my $mid = $l =~ /^\* (\d+) FETCH/i ? $1 : undef;
- $mid or next;
-
- if ( $uids->{$mid} ) { $entry = $uids->{$mid} }
- else { $uids->{$mid} ||= $entry }
+ $uids->{$mid} = $entry;
}
- foreach my $w (@words) {
- if ( $l =~ /\Q$w\E\s*$/i ) {
- $entry->{$w} = $output->[ $x + 1 ];
- $entry->{$w} =~ s/(?:$CR?$LF)+$//og;
- chomp $entry->{$w};
- }
- elsif (
- $l =~ /\( # open paren followed by ...
- (?:.*\s)? # ...optional stuff and a space
- \Q$w\E\s # escaped fetch field<sp>
- (?:" # then: a dbl-quote
- (\\.| # then bslashed anychar(s) or ...
- [^"]+) # ... nonquote char(s)
- "| # then closing quote; or ...
- \( # ...an open paren
- ([^\)]*) # ... non-close-paren char(s)
- \)| # then closing paren; or ...
- (\S+)) # unquoted string
- (?:\s.*)? # possibly followed by space-stuff
- \) # close paren
- /xi
- )
- {
- $entry->{$w} = defined $1 ? $1 : defined $2 ? $2 : $3;
+ for my $word (keys %$entry) {
+ next if exists $words{$word};
+
+ if (my ($stuff) = $word =~ m/^BODY(\[.*)$/) {
+ next if exists $words{"BODY.PEEK".$stuff};
}
+
+ delete $entry->{$word};
}
}
+
return wantarray ? %$uids : $uids;
}
diff --git a/lib/Mail/IMAPClient.pod b/lib/Mail/IMAPClient.pod
index 1e44874..dc31dd3 100644
--- a/lib/Mail/IMAPClient.pod
+++ b/lib/Mail/IMAPClient.pod
@@ -1168,27 +1168,12 @@ This would result in L<Data::Dumper> output similar to the following:
}
};
-You can specify I<BODY[HEADER.FIELDS ($fieldlist)> as an argument, but
-you should keep the following in mind if you do:
-
-B<1.> You can only specify one argument of this type per call. If you
-need multiple fields, then you'll have to call B<fetch_hashref>
-multiple times, each time specifying a different FETCH attribute but
-the same.
-
-B<2.> Fetch operations that return RFC822 message headers return the
-whole header line, including the field name and the colon. For
-example, if you do a C<$imap-E<gt>fetch_hash("BODY[HEADER.FIELDS
-(Subject)]")>, you will get back subject lines that start with
-"Subject: ".
-
-By itself this method may be useful for, say, speeding up programs
-that want the size of every message in a folder. It issues one
-command and receives one (possibly long!) response from the server.
-However, it's true power lies in the as-yet-unwritten methods that
-will rely on this method to deliver even more powerful result hashes
-(and which may even remove the restrictions mentioned in B<1> and
-B<2>, above). Look for more new function in later releases.
+By itself this method may be useful for, say, speeding up programs that
+want the size of every message in a folder. It issues one command and
+receives one (possibly long!) response from the server. However, it's
+true power lies in the as-yet-unwritten methods that will rely on this
+method to deliver even more powerful result hashes. Look for more new
+function in later releases.
This method is new with version 2.2.3 and is thus still experimental.
If you decide to try this method and run into problems, please see the