Skip Menu |

This queue is for tickets about the MIME-tools CPAN distribution.

Report information
The Basics
Id: 72538
Status: resolved
Priority: 0/
Queue: MIME-tools

People
Owner: dfs+pause [...] roaringpenguin.com
Requestors: Mark.Martinec [...] ijs.si
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: MIME/Parser/Reader thinks a File::Temp object is not capable of native I/O
Date: Fri, 18 Nov 2011 21:04:17 +0100
To: bug-MIME-tools [...] rt.cpan.org
From: Mark Martinec <Mark.Martinec [...] ijs.si>
Using: MIME-Tools 5.502, perl 5.14.1 . While investigating poor performance of MIME::Parser on processing larger mail message with attachments (lots of 77-byte lines), it turns out that read_chunk() chooses a slow branch, thinking that a temporary output file created by File::Temp::new is not capable of native I/O, based on a wrong verdict from native_handle(). From the File::Temp man page: Note that there is no method to obtain the filehandle from the File::Temp object. The object itself acts as a filehandle. Also, the object is configured such that it stringifies to the name of the temporary file, and can be compared to a filename directly. The object isa "IO::Handle" and isa "IO::Seekable" so all those methods are available. So what is missing is a line like: return $fh if $fh->isa('IO::Handle'); in native_handle(). On a side track, printing thousands of 77-byte lines is not terribly efficient. Buffering up helps a little. Rewriting the read_chunk() to do its I/O by chunks instead of line-by-line could provide an impressive speedup (but this is beyond the scope of this PR). Attached is a small patch: it lets native_handle() recognize a temporary file object as provided by File::Temp as being native, and adds a little buffering to the native+native branch of read_chunk(). This provides some speedup in parsing large mail with B64 attachments (of the order of 10 %). Not too impressive, but every little bit helps. Btw, the Devel::NYTProf perl module is a tremendous tool for spotting bottlenecks in perl code! Regards Mark

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #72538] MIME/Parser/Reader thinks a File::Temp object is not capable of native I/O
Date: Mon, 21 Nov 2011 11:01:59 -0500
To: bug-MIME-tools [...] rt.cpan.org
From: "David F. Skoll" <dfs [...] roaringpenguin.com>
Hi, Mark, Show quoted text
> While investigating poor performance of MIME::Parser on processing > larger mail message with attachments (lots of 77-byte lines), it > turns out that read_chunk() chooses a slow branch, thinking that a > temporary output file created by File::Temp::new is not capable of > native I/O, based on a wrong verdict from native_handle().
Thanks for your patch. I'll include at least this part: + return $fh if $fh->isa('IO::Handle'); # File::Temp obj isa "IO::Handle" ! Show quoted text
> On a side track, printing thousands of 77-byte lines is not terribly > efficient. Buffering up helps a little.
I don't like that part of the patch. Perl I/O should eventually boil down to C stdio, so the writes should be buffered by the C standard I/O library. Unless you can show me strace output showing small write() system calls, I would rather not add another layer of buffering. Regards, David.
Subject: Re: [rt.cpan.org #72538] MIME/Parser/Reader thinks a File::Temp object is not capable of native I/O
Date: Mon, 21 Nov 2011 18:48:54 +0100
To: bug-MIME-tools [...] rt.cpan.org
From: Mark Martinec <Mark.Martinec [...] ijs.si>
Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=72538 >
> > While investigating poor performance of MIME::Parser on processing > > larger mail message with attachments (lots of 77-byte lines), it > > turns out that read_chunk() chooses a slow branch, thinking that a > > temporary output file created by File::Temp::new is not capable of > > native I/O, based on a wrong verdict from native_handle().
> > Thanks for your patch. I'll include at least this part: > > + return $fh if $fh->isa('IO::Handle'); # File::Temp obj isa "IO::Handle"
Great, thanks. Just measured it again, calling $parser->parse($fh) for a 3.2 MiB mail message (with /tmp on SSD) the above change cuts the elapsed time by about 25%, from 7.7 ms/msg down to 5.7 ms/msg . Show quoted text
> > On a side track, printing thousands of 77-byte lines is not terribly > > efficient. Buffering up helps a little.
> > I don't like that part of the patch. Perl I/O should eventually boil > down to C stdio, so the writes should be buffered by the C standard I/O > library. Unless you can show me strace output showing small write() system > calls, I would rather not add another layer of buffering.
I agree, the tiny additional saving is not worth the complication. The NYTProf does show most time spent in line-by-line reads and writes: (10 messages each 43074 lines, 10x 3.2 MiB): # spent 439ms making 430680 calls to MIME::Parser::Reader::CORE:readline, avg 1µs/call # spent 428ms making 430680 calls to MIME::Parser::Reader::CORE:print, avg 993ns/call in addition to the rest of code in that copying loop, but the true solution lies in processing by larger chunks instead of line-by-line, and that is nontrivial in the given case. If I come with some better alternative, I'll let you know. Take care Mark
Hi, I have just uploaded MIME-tools-5.503 to CPAN, which I believe resolves this ticket. Regards, David.