Skip Menu |

This queue is for tickets about the Mail-IMAPClient CPAN distribution.

Report information
The Basics
Id: 18376
Status: resolved
Priority: 0/
Queue: Mail-IMAPClient

People
Owner: PLOBBES [...] cpan.org
Requestors: kjetilho [...] ifi.uio.no
rct [...] r-t.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: fetch_hash() fails on large folders (command line too long)
Date: Sun, 26 Mar 2006 16:13:37 -0500
To: Comment-Mail-IMAPClient [...] rt.cpan.org
From: Robert Terzi <rct [...] r-t.org>
Version: 2.2.9 [FYI I tried to send this initially to DJKERNEN@cpan.org as a discussion as mentioned in the docs. I got a bounce back from map@kernengroup.com I suppose technically this isn't a bug but rather more of a feature request.] fetch_hash() can fail on large folders due to limits in the IMAP server command length (UW-IMAP takes 8,000 bytes). In particular this affects (older) folders with sparse UID numbers, so even when using Range() the command length exceeds 8,000 bytes. For fetch_hash() I don't see a reasonable work around for client code -- other than reimplementing fetch_hash(). After thinking about it a little bit, this is what I came up with: Would it be reasonable to have a version of Range()/MessageSet that when given an argument for Max. command line size could return an array of messageSets where the string version is smaller than the max command line length? Code that uses Range such as fetch(), fetch_hash(), ... would have to iterate over of the array of objects (or strings) returned. I don't know how many other IMAP servers are strict about command length, I found one message from Mark Crispin where he propoes that the official limit should be 1,000 characters which sees unreasonbly strict. I may try to implement something along these lines but I'm not sure I understand enough of messageset to do that withony breaking anything yet. IMAPClient is very nicely designed to make it easy to write quick IMAP scripts. I realize you can't possibly handle everything for every IMAP server. However, I think tackling (some) of the command line length issues in IMAPClient, would make it more robust. Thanks, --Rob
Subject: "word too long" using fetch_hash
Date: Wed, 05 Apr 2006 08:36:56 +0200
To: bug-Mail-IMAPClient [...] rt.cpan.org
From: Kjetil Torgrim Homme <kjetilho [...] ifi.uio.no>
I'm trying to use imapsync to move folders from one Cyrus server to another, and it relies on Mail::IMAPClient to handle the protocol. this works very well with small folders, but fails when a folder contains many messages. we're using version 2.2.9. transcript from debug output: Sending: 22 UID SEARCH ALL Sent 19 bytes Read: * SEARCH 841 843 844 [...] 26383 26385 26387 22 OK Completed (7375 msgs in 0.000 secs) Sending: 23 UID FETCH 841,843:844,[...]26383,26385,26387 (FLAGS INTERNALDATE RFC822.SIZE) Sent 29329 bytes Read: * BYE Fatal error: word too long I can't find anything in RFC 3501 which specifies limits to the length of sequence sets, but Cyrus 2.2.12 has this limit: imap/imapparse.c: MAXWORD = 16384, so the code in fetch_hash is too simplistic for use "in the wild", unfortunately: my $msgref = scalar($self->messages); my $output = scalar($self->fetch($msgref,"(" . join(" ",@_) . ")")) this needs to broken up into a loop, I suggest doing 1000 in each iteration so it works even with very large mailboxes (I don't think it matters much for performance whether you do 1000 or 4000). hmm. perhaps it is better to fix fetch() instead. thanks for Mail::IMAPClient! -- Kjetil T.
From: kjetilho [...] ifi.uio.no
this is a duplicate of #18376
From: kjetilho [...] ifi.uio.no
The suggested patch fixes fetch_hash, there may be more areas in the code which needs a similar fix.
--- ../orig/Mail-IMAPClient-2.2.9/IMAPClient.pm 2003-07-02 19:28:54.000000000 +0200 +++ IMAPClient.pm 2006-05-17 02:33:31.116283000 +0200 @@ -2333,12 +2333,22 @@ } elsif (ref($what) or $what =~ /^[,:\d]+\w*$/) { $what = $self->Range($what); } - $self->_imap_command( ( $self->Uid ? "UID " : "" ) . - "FETCH $what" . ( @_ ? " " . join(" ",@_) : '' ) - ) or return undef; - return wantarray ? $self->History($self->Count) : - [ map { $_->[DATA] } @{$self->{'History'}{$self->Count}} ]; - + my $flags = @_ ? " " . join(" ", @_) : ''; + my @results = (); + # UW-IMAPD has a max word size of 1024, Cyrus has 16384. + my $maxwordsize = 1024; + for my $range ($what =~ /(.{1,$maxwordsize})(?:,|$)/g) { + $self->_imap_command(($self->Uid ? "UID " : "") . + "FETCH $range" . $flags) + or next; + if (wantarray) { + push(@results, $self->History($self->Count)) + } else { + push(@results, map { $_->[DATA] } + @{$self->{'History'}{$self->Count}}); + } + } + return wantarray ? @results : \@results; } @@ -2601,7 +2611,7 @@ if ($fields[0] =~ /^[Aa][Ll]{2}$/ ) { - $string = "$msg body" . + $string = "body" . # use ".peek" if Peek parameter is a) defined and true, # or b) undefined, but not if it's defined and untrue: @@ -2611,7 +2621,7 @@ ) . "[header]" ; } else { - $string = "$msg body" . + $string = "body" . # use ".peek" if Peek parameter is a) defined and true, or # b) undefined, but not if it's defined and untrue: @@ -2621,7 +2631,7 @@ ) . "[header.fields (" . join(" ",@fields) . ')]' ; } - my @raw=$self->fetch( $string ) or return undef; + my @raw=$self->fetch($msg, $string) or return undef; my $headers = {}; # hash from message ids to header hash my $h = 0; # reference to hash of current msgid, or 0 between msgs
Hi Rob, On Sun Mar 26 16:14:14 2006, rct@r-t.org wrote: Show quoted text
> [FYI I tried to send this initially to DJKERNEN@cpan.org as a > discussion has mentioned in the docs. I got a bounce back from > map@kernengroup.com
I cannot reach him either, so captured this name-space. A rigorous cleanup of the code is complete, but needs testing... however there are no interface changes (yet0. Show quoted text
> I suppose technically this isn't a bug but rather more of a feature > request.
It's more like a protocol design bug/problem. Show quoted text
> fetch_hash() can fail on large folders due to limits in > the IMAP server command length (UW-IMAP takes 8,000 bytes). In > particular this affects (older) folders with sparse UID numbers, so > even when using Range() the command length exceeds 8,000 bytes. > > For fetch_hash() I don't see a reasonable work around for client > code -- other than reimplementing fetch_hash().
So... let's rewrite it. Did you find a solutions? Doesn't look too complex to me: in fetch() simple check for the length of the _imap_command(), and split/rejoin when it is too long. Show quoted text
> Would it be reasonable to have a version of Range()/MessageSet that > when given an argument for Max. command line size could return an > array of messageSets where the string version is smaller than the > max command line length? > Code that uses Range such as fetch(), fetch_hash(), ... would have > to iterate over of the array of objects (or strings) returned.
possible. Probably, there are only a few places. Show quoted text
> I don't know how many other IMAP servers are strict > about command length, I found one message from Mark Crispin where > he propoes that the official limit should be 1,000 characters which > sees unreasonbly strict.
Once the solution is made, it doesn't matter much. Show quoted text
> I may try to implement something along these lines but I'm not > sure I understand enough of messageset to do that withony breaking > anything yet.
I hope you have time for it. Be warned: the new code looks much different (cleaner, I hope) Show quoted text
> IMAPClient is very nicely designed to make it easy to write quick > IMAP scripts. I realize you can't possibly handle everything for > every IMAP server. However, I think tackling (some) of the command > line length issues in IMAPClient, would make it more robust.
Sure -- MarkOv
On Wed Apr 05 02:37:25 2006, kjetilho@ifi.uio.no wrote: Show quoted text
> I'm trying to use imapsync to move folders from one Cyrus server to > another, and it relies on Mail::IMAPClient to handle the protocol. > this > works very well with small folders, but fails when a folder contains > many messages. we're using version 2.2.9.
I am the new maintainer, and will include patches/fixes. The code has seen a major clean-up (to be released). Did you implement a work-around for your problem? Maybe join efforts with the author of the other reports about the same limitation?
In 3.17_01 (not released) and/or in 3.17 when released we will have the following changes: - *new attribute Maxcommandlength used by fetch() to limit length of commands sent to a server. This should removes need for utilities like imapsync to create their own split() functions and instead allows Mail::IMAPClient to hopefully "do the right thing" (default will still be unlimited length) - added _split_sequence() to support new Maxcommandlength argument - fetch() + use new Maxcommandlength to split a request into multiple subrequests then aggregate results before passing them back to the caller
Subject: Re: [rt.cpan.org #18376] fetch_hash() fails on large folders (command line too long)
Date: Sat, 25 Apr 2009 17:39:28 +0200
To: Phil Lobbes via RT <bug-Mail-IMAPClient [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Phil Lobbes via RT (bug-Mail-IMAPClient@rt.cpan.org) [090425 04:21]: Show quoted text
> Queue: Mail-IMAPClient > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=18376 > > > In 3.17_01 (not released) and/or in 3.17 when released we will have the > following changes:
Good to hear Show quoted text
> - *new attribute Maxcommandlength used by fetch() to limit > length of commands sent to a server. This should removes > need for utilities like imapsync to create their own split() > functions and instead allows Mail::IMAPClient to hopefully > "do the right thing" (default will still be unlimited length)
I would use a good large default, which DWIMs more. It will probably not hurt existing applications. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #18376] fetch_hash() fails on large folders (command line too long)
Date: Sat, 25 Apr 2009 17:15:25 -0400
To: bug-Mail-IMAPClient [...] rt.cpan.org
From: Phil Lobbes <phil [...] perkpartners.com>
Mark Overmeer via RT <bug-Mail-IMAPClient@rt.cpan.org> wrote: Show quoted text
> > - *new attribute Maxcommandlength used by fetch() to limit > > length of commands sent to a server. This should removes > > need for utilities like imapsync to create their own split() > > functions and instead allows Mail::IMAPClient to hopefully > > "do the right thing" (default will still be unlimited length)
> > I would use a good large default, which DWIMs more. It will probably > not hurt existing applications.
Preferences? 1000B, 1K or maybe 8K... I've seen a 16K limit for courier-imap: #define IT_MAX_ATOM_SIZE 16384 But I wouldn't be surprised if someone else has a lower limit. FWIW, RFC2683 section 3.2.1.5 suggests: "A client should limit the length of the command lines it generates to approximately 1000 octets" ... "For its part, a server should allow for a command line of at least 8000 octets" Maybe we should default to 1000 and be done with it. At least we'll follow one recommendation :-). Phil
Subject: Re: [rt.cpan.org #18376] fetch_hash() fails on large folders (command line too long)
Date: Mon, 27 Apr 2009 10:20:39 +0200
To: "phil [...] perkpartners.com via RT" <bug-Mail-IMAPClient [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* phil@perkpartners.com via RT (bug-Mail-IMAPClient@rt.cpan.org) [090425 21:15]: Show quoted text
> Preferences? 1000B, 1K or maybe 8K... I've seen a 16K limit for > courier-imap: > #define IT_MAX_ATOM_SIZE 16384 > But I wouldn't be surprised if someone else has a lower limit.
Show quoted text
> FWIW, RFC2683 section 3.2.1.5 suggests: > "A client should limit the length of the command lines it generates to > approximately 1000 octets"
If you put the limit low, than the option will get tested continuously ;-) I can imagin that it will have considerable impact on server where the messages are kept in a database, but am unsure whether they benefit or not. Great improvement. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
CC: rct [...] r-t.org, DJKERNEN__NO_SOLICITING__ [...] cpan.org
Subject: Re: [rt.cpan.org #18376] fetch_hash() fails on large folders (command line too long)
Date: Mon, 27 Apr 2009 11:24:29 +0200
To: bug-Mail-IMAPClient [...] rt.cpan.org
From: Kjetil Torgrim Homme <kjetilho [...] ifi.uio.no>
On Mon, 2009-04-27 at 04:21 -0400, Mark Overmeer via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=18376 > > > * phil@perkpartners.com via RT (bug-Mail-IMAPClient@rt.cpan.org) [090425 21:15]:
> > Preferences? 1000B, 1K or maybe 8K... I've seen a 16K limit for > > courier-imap: > > #define IT_MAX_ATOM_SIZE 16384 > > But I wouldn't be surprised if someone else has a lower limit.
>
> > FWIW, RFC2683 section 3.2.1.5 suggests: > > "A client should limit the length of the command lines it generates to > > approximately 1000 octets"
> > If you put the limit low, than the option will get tested > continuously ;-) I can imagin that it will have considerable > impact on server where the messages are kept in a database, but > am unsure whether they benefit or not.
thank you for fixing this! 1000 sounds good, but it would be useful to include the practical information about limits which have surfaced in the documentation, ie. 8000 octets for UW-IMAPD and 16384 octets for Courier and Cyrus so that users can make an informed decision when tweaking the limit. (BTW, although the Courier and Cyrus limit is on the word/atom length, not the total command line length, I don't think it's very useful to go to that kind of detail in the docs.) -- best wishes, Kjetil T.
Next release (3.17) will have Maxcommandlength documented and also includes some notes on the limits seen with UW-IMAPD, Courier and Cyrus.