Skip Menu |

This queue is for tickets about the autodie CPAN distribution.

Report information
The Basics
Id: 46984
Status: resolved
Priority: 0/
Queue: autodie

People
Owner: PJF [...] cpan.org
Requestors: mschwern [...] cpan.org
Cc: niels [...] thykier.net
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.999
Fixed in: 2.22



Subject: Speed up Fatal by reducing calls to eval?
I was looking at how to make perl5i load faster and did a profile run with Devel::NYTProf It revealed that 60ms (out of 160) was being spent in Fatal::_make_fatal(). 60ms might not seem a lot, but if I'm to promote perl5i as being something you slap onto every Perl program to generally improve the language I'm concerned about startup time. Of that 25ms is on a single line: $code = eval("package $pkg; use Carp; $code"); ## no critic # spent 2.56ms making 51 calls to Exporter::import, avg 50µs/call # spent 1.44ms making 63 calls to warnings::unimport, avg 23µs/call Another 10ms is here. $leak_guard = eval $leak_guard; ## no critic If the number of evals() or calls to _make_fatal() can be reduced then the load time for autodie can be drastically reduced. It seems that Fatal::import() is calling Fatal->_make_fatal() with all the same arguments except one, the function being made fatal. Perhaps instead of calling it one by one it can take a list and do them all at once, setting up the environment once and doing an eval once.
G'day Schwern, On Mon Jun 15 21:12:58 2009, MSCHWERN wrote: Show quoted text
> I was looking at how to make perl5i load faster and did a profile run > with Devel::NYTProf It revealed that 60ms (out of 160) was being spent > in Fatal::_make_fatal(). 60ms might not seem a lot, but if I'm to > promote perl5i as being something you slap onto every Perl program to > generally improve the language I'm concerned about startup time.
I've actually made no attempt to optimise either the start-up or run-time of autodie/Fatal, besides from basic common sense. So I'm actually quite glad to hear it's only 60ms. ;) Show quoted text
> Of that 25ms is on a single line: > > $code = eval("package $pkg; use Carp; $code"); ## no critic > # spent 2.56ms making 51 calls to Exporter::import, avg 50µs/call > # spent 1.44ms making 63 calls to warnings::unimport, avg 23µs/call
That's the line that turns our auto-generated code into beautiful magic. I'm not surprised it takes up a lot of time; $code can potentially be quite complex, although I reckon we can trim the need to import from Carp. However I *think* we also do some caching after we've generated our code. The reason that's important is that while you may pay a start-up cost the first time you use autodie/perl5i, using it a second time should be much cheaper. This means for big projects, the start-up cost should be comparatively smaller than for single-file programs. Looking at the code, I also think we might be able to do the caching earlier. The caching *may* not work across package boundaries, since many built-ins care about their current package. It could be made more speedy by enumerating which built-ins care about their package, although that's rather tedious work. Show quoted text
> Another 10ms is here. > > $leak_guard = eval $leak_guard; ## no critic
Caching leak guards is on my todo list. I'll need to check my notes, but I seem to recall that with a modern enough version of Perl, they're not needed at all. However that 'modern enough' may currently be 5.10.1. Show quoted text
> It seems that Fatal::import() is calling Fatal->_make_fatal() with all > the same arguments except one, the function being made fatal. Perhaps > instead of calling it one by one it can take a list and do them all at > once, setting up the environment once and doing an eval once.
Fatal/autodie needs to compile code for each built-in that it's making autodying. They all have their own prototypes and foibles, so there's no easy way of getting around this. The eval lines are actually compiling the code, so if they're taking a long time because there's a lot of code, then batching them all together may not help... Then again, it might. Being able to reduce the number of actual subroutine calls to _make_fatal should certainly be possible. That will at least save some time around the edges. I obviously need to give this more time and thought. Trouble is, in 5 minutes I'm being shown around Saudi Arabia by the locals, then I'm sleeping, then I'm teaching for a day, then I'm collapsing into bed, and then I'm off to Dubai. After all that, getting the hinting interface out the door is still higher on my critical list (since it seems to be working, it should be shipped). So, potential ways of moving forward: * Fork my repository at http://github.com/pfenwick/autodie/tree/master , make it faster/better/stronger, send me a pull request. * As above, but proof-read the autodie::hints documentation, some of which is now wrong, and correct/query it. I'll be so glad not to look at it anymore I'll try playing with speed improvements. ;) * Send me regular reminders, especially just before my 14 hour Dubai -> Melbourne flight on Monday. I do lots of coding in the air, and to minimise jet-lag I'll be awake for much of the flight. * Catch me in person pre-OSCON. I'm flying over a few days before the conference with an aim to hang out with cool people / head up to Portland / finish my slides. ;) All the very best, Paul
Subject: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Tue, 30 Apr 2013 23:55:29 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
Hi, I recently converted a large part of a codebase to using autodie[1]. While it is quite pleasing from a development PoV, it did introduce a quite visible overhead in our start up time. From what I have measured, our runtime has increased about 1.5s since our last release[2]. At least 1.3s of these can be traced back to autodie/Fatal[3] and from what I can tell the majority of the overhead introduced by autodie is indeed in the compilation phase. If the leak-guard code is removed/commented out, the overhead is reduced by ~0.4-0.5 seconds. It did not cause any issues in my project (but then again, the project does not really use string eval). If these guards are truly redundant (with Perl >= 5.10.1), then I would definitely appreciate it if autodie could optimize them out. I would also appreciate if autodie/Fatal could reduce the overhead even further. I did try coming up with a patch, but I got nothing with tangible results so far. ~Niels [1] "lintian" for the curious or if it matters. [2] Originally it was around: real 0m5.377s user 0m6.088s sys 0m0.260s Now I am seeing runtimes like: real 0m6.684s user 0m7.404s sys 0m0.460s for a given input file. For reference, the project has about 80 or so perl modules using autodie atm (most of which are just a single "use autodie;"). [3] By making Fatal's export function return immediately, the runtime was reduced by ~1.3s.
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Thu, 09 May 2013 12:04:57 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-04-30 23:55, Niels Thykier wrote: Show quoted text
> Hi, > > [...] > > If the leak-guard code is removed/commented out, the overhead is reduced > by ~0.4-0.5 seconds. It did not cause any issues in my project (but > then again, the project does not really use string eval). > If these guards are truly redundant (with Perl >= 5.10.1), then I > would definitely appreciate it if autodie could optimize them out. >
I've been told I misunderstood something and that the leakguards are indeed required. Show quoted text
> I would also appreciate if autodie/Fatal could reduce the overhead even > further. I did try coming up with a patch, but I got nothing with > tangible results so far. >
From Fatal.pm: """ # I thought that changing package was a monumental waste of # time for CORE subs, since they'll always be the same. However # that's not the case, since they may refer to package-based # filehandles (eg, with open). # # There is potential to more aggressively cache core subs # that we know will never want to interact with package variables # and filehandles. """ Attached is a patch that reduces the average load time by caching some CORE subs and avoids recompiling them into the package. I deliberately picked CORE subs that did not have "*" in their prototype (these will need a bit more "love" than this as noted above). AFAICT, the "only" case where a sub "has" to be compiled into the given package, is if the wrapped sub uses caller. But in that case, the caller depth will generally be wrong (i.e. the wrapped sub will see the autodie wrapper or its leak guard) rather than what it is expected to find[1]. That said I remember any CORE subs relying on caller (though if there are such, my patch might need updating). Subs using package variables in their own package has never been an issue (since the wrapper does not affect the package of the wrapped sub). For bareword arguments, these can be qualified (e.g. via Symbol::qualify). Admittedly, it is not entirely trivial to do - especially not with prototypes containing multiple "*". A simplified example version of open (i.e. 3+ args variant) can be implemented as: """ sub checked_open(*$@) { if (defined($_[0]) and ref($_[0]) eq '') { return open(qualify($_[0], scalar(caller)), $_[1], @_[2..$#_]) // die("open ...: $!); } return open($_[0], $_[1], @_[2..$#_]) // die("open ...: $!); } checked_open(HANDLE, '<', 'some-file'); # even works with already open handles, like regular open checked_open(HANDLE, '<', 'some-file-2'); close(HANDLE); checked_open(my $fd, '<', 'some-other-file'); # even works with already open handles, like regular open checked_open($fd, '<', 'some-other-file-2'); close($fd); """ This version can be compiled into any package and will work as a replacement for 3-arg open. There is a possible issue for people using (e.g.) "ENV" as name for their filehandles, which Symbol::qualify will always qualify as "main::ENV". That could be worked around by wrapping qualify for CORE subs expecting a filehandle and special casing those (though at that point I guess you are pretty much re-implementing qualify from scratch anyway). Oh, one more caveat. In the $lexical case, the call to qualify + caller would have to account for the leak guard as well (i.e. use caller($i) rather than just caller). Nevertheless, I suspect implementing this would greatly decrease the time spent in "use autodie;" calls. Show quoted text
> > [...] > > [2] Originally it was around: > > real 0m5.377s > user 0m6.088s > sys 0m0.260s > > Now I am seeing runtimes like: > > real 0m6.684s > user 0m7.404s > sys 0m0.460s > > for a given input file. [...] >
With the patch, I am seeing a ~0.200-0.250s reduction on the user time. ~Niels PS: The 0.006s improvement mentioned in the 0001 patch was estimated from running time perl benchmark.pl with unpatched and patched autodie. [1] Though, one could use probably have the autodie wrapper work with Carp out of the box by using $Carp::CarpLevel or @CARP_NOT (never really played with those, so I am not 100% sure here).

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Dude! I've just got back home after watching Star Trek, and this looks awesome! I'm going to be playing with it when I wake up. Thank you so much! Paul
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 11:55:21 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
Quote: Show quoted text
> Dude! I've just got back home after watching Star Trek, and this > looks awesome! I'm going to be playing with it when I wake up. Thank > you so much! > > Paul
Hi, I am glad you like it. :) I think I might have made a mistake in the patch though; there should probably be a separate cache for lexical (i.e. autodie) vs non-lexical (i.e. Fatal) uses. I expect it could cause autodie functions to be installed without a leak guard. (Not sure if there is a problem for Fatal subs with leak guards beyond the runtime overhead). Also, I got a mail from travis-ci about the build failing (linking to [1]). As far as I can tell, it was an unrelated failure (tests in (build-)dependencies failing), but I just wanted to confirm I hadn't overlooked something. ~Niels [1] https://travis-ci.org/pjf/autodie/builds/7019464
On Fri May 10 05:55:46 2013, niels@thykier.net wrote: Show quoted text
> I am glad you like it. :)
Very much so! Show quoted text
> I think I might have made a mistake in the patch though; there should > probably be a separate cache for lexical (i.e. autodie) vs non-lexical > (i.e. Fatal) uses. I expect it could cause autodie functions to be > installed without a leak guard. (Not sure if there is a problem for > Fatal subs with leak guards beyond the runtime overhead).
I'm checking this now. I've only just started to meditate properly on the patch, and I'm doing that in a noisy bar, so we'll see how that goes. :) In any case we can throw more tests at it to find out. Show quoted text
> Also, I got a mail from travis-ci about the build failing (linking to > [1]). As far as I can tell, it was an unrelated failure (tests in > (build-)dependencies failing), but I just wanted to confirm I hadn't > overlooked something.
I'm pretty sure that's unrelated, since *master* is also failing. Then again, I am getting a local test failing on my laptop (on version_tag.t) which isn't happening in master; I'll probably look at that before looking at travis. Paul
On Fri May 10 07:45:28 2013, PJF wrote: Show quoted text
> Then again, I am getting a local test failing on my laptop (on > version_tag.t) which isn't happening in master; I'll probably look at > that before looking at travis.
So, version_tag.t fails if CORE::utime is in the list of reusable built-ins. It passes if it's removed. It's presence there is somehow causing a Fatalised version to either stick around, or be exported, when an older version tag is passed in. Diagnosing now.
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 14:01:26 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
[...] Show quoted text
> > I think I might have made a mistake in the patch though; there > > should probably be a separate cache for lexical (i.e. autodie) vs > > non-lexical (i.e. Fatal) uses. I expect it could cause autodie > > functions to be installed without a leak guard. (Not sure if there > > is a problem for Fatal subs with leak guards beyond the runtime > > overhead).
> > I'm checking this now. I've only just started to meditate properly on > the patch, and I'm doing that in a noisy bar, so we'll see how that > goes. :) In any case we can throw more tests at it to find out. > > [...]
I added a couple of patches that may be of interest. Particularly, in 0004 I took a stab at fixing fill_protos so it can generate the right call for globs. That said, I noticed a couple of issues with my patch with the test suite (when replacing "open" with "chmod", since open is still done the "old" way). * package name is sometimes Fatal (where it used to be main). - I guess it is because there is no "non-Fatal" frame in those call stacks, so autodie::exception just picks a Fatal frame to have a caller (instead of undef). * Globs will now be identified as "$package::NAME" instead just "NAME" in error messages. It can possibly be either s///'ed out or alternatively the tests+docs can be updated accordingly. * I cannot quite figure out the right "caller depth" when the leak guard is in place. (It uses both goto \&$sref and CORE::sub(@args)). It is possible I just need more coffee. * It might not be possible to cache the wrappers for CORE::open and CORE::sysopen (%Retval_action suggest they use "caller(0)"). On a related topic, perhaps calls to binmode in %Retval_action should be guarded[1]. ~Niels [1] What happens when only open is checked and binmode is not? E.g. """ use autodie qw(open); open(my $fd, "<:some-layer", 'some-file'); """

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Drat, we're seeing some sort of leakage of lexical subs. If you pretend that this also updates the plan (which is in the next commit) you can see it in action: https://github.com/pjf/autodie/commit/a24dbb1ce51af4a0f92cac74d9099a8166d28235 I'm going to have to dash (and I haven't read the email you sent which just came in), but I'm hoping I can continue to look at this after my social engagements (although at 2am I may not have a full cognitive stack, so we'll see).
Travis is whacked. Something in our test dependency tree is failing to install. I haven't investigated properly. Sent from a Melbourne tram. :)
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 14:44:32 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-10 14:10, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > Drat, we're seeing some sort of leakage of lexical subs. > > If you pretend that this also updates the plan (which is in the next commit) you can see it in action: > > https://github.com/pjf/autodie/commit/a24dbb1ce51af4a0f92cac74d9099a8166d28235 > > I'm going to have to dash (and I haven't read the email you sent which just came in), but I'm hoping I can continue to look at this after my social engagements (although at 2am I may not have a full cognitive stack, so we'll see). >
It seems to be caused by the lack of lexical vs. non-lexical caches (mentioned in the email you haven't read yet). When I apply the patch "Fatal: Add lexical vs non-lexical CORE sub cache", then the problem seems to go away. ~Niels
Patches 1-3 applied, and we have a happy test suite. :) (Travis is also cooperating after tweaking our .travis.yml file, so if you get any more emails, they should be nice ones.) It's almost 4am, so I'm going to grab a coffee and see if I can get autodie, your email, and these patches into my head. :) In other news, 4am came around *awfully* quickly. o_O https://github.com/pjf/autodie/tree/cache_improvements is where everything is going.
On Fri May 10 13:55:15 2013, PJF wrote:
Show quoted text

I'm so glad this is getting worked on!  However, I'm sorry to report that I tried out that branch and see no load time improvements either for "use autodie" nor "use perl5i" which uses autodie nor perl5i's test suite, nor loading autodie several times.

    perl -Ilib -wle '{ package Foo; use autodie; } {package Bar; use autodie } { package Baz; use autodie }'

Is it at a place where it should?
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 20:51:39 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-10 20:26, Michael G Schwern via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > On Fri May 10 13:55:15 2013, PJF wrote:
>> https://github.com/pjf/autodie/tree/cache_improvements is where >> everything is going.
> > I'm so glad this is getting worked on! However, I'm sorry to report that I > tried out that branch and see no load time improvements either for "use > autodie" nor "use perl5i" which uses autodie nor perl5i's test suite, nor > loading autodie several times. > > perl -Ilib -wle '{ package Foo; use autodie; } {package Bar; use autodie } { > package Baz; use autodie }' > > Is it at a place where it should? >
Indeed, my patch introduced a mistake that broke caching and hid the unimport problem. I have included a patch to deal with both of these. Thank you for spotting that silly (double) mistake of mine. ~Niels

Message body is not shown because sender requested not to inline it.

Latest patch applied to git. On Fri May 10 14:52:00 2013, niels@thykier.net wrote: Show quoted text
> On 2013-05-10 20:26, Michael G Schwern via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > > > On Fri May 10 13:55:15 2013, PJF wrote:
> >> https://github.com/pjf/autodie/tree/cache_improvements is where > >> everything is going.
> > > > I'm so glad this is getting worked on! However, I'm sorry to report
> that I
> > tried out that branch and see no load time improvements either for
> "use
> > autodie" nor "use perl5i" which uses autodie nor perl5i's test
> suite, nor
> > loading autodie several times. > > > > perl -Ilib -wle '{ package Foo; use autodie; } {package Bar; use
> autodie } {
> > package Baz; use autodie }' > > > > Is it at a place where it should? > >
> > Indeed, my patch introduced a mistake that broke caching and hid the > unimport problem. I have included a patch to deal with both of these. > > Thank you for spotting that silly (double) mistake of mine. > > ~Niels > >
And just to report, the cache_improvements branch takes the benchmark in this thread from 19 seconds down to 12, which is awesome. Niels's benchmark simulating many packages using autodie, so you may not see a speed improvement with a single perl5i load (we still need to generate all the code), but you should see an improvement on lots of perl5i loads (since a lot of it will be cached). Paul On Fri May 10 14:56:22 2013, PJF wrote: Show quoted text
> Latest patch applied to git. > > On Fri May 10 14:52:00 2013, niels@thykier.net wrote:
> > On 2013-05-10 20:26, Michael G Schwern via RT wrote:
> > > <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > > > > > On Fri May 10 13:55:15 2013, PJF wrote:
> > >> https://github.com/pjf/autodie/tree/cache_improvements is where > > >> everything is going.
> > > > > > I'm so glad this is getting worked on! However, I'm sorry to report
> > that I
> > > tried out that branch and see no load time improvements either for
> > "use
> > > autodie" nor "use perl5i" which uses autodie nor perl5i's test
> > suite, nor
> > > loading autodie several times. > > > > > > perl -Ilib -wle '{ package Foo; use autodie; } {package Bar; use
> > autodie } {
> > > package Baz; use autodie }' > > > > > > Is it at a place where it should? > > >
> > > > Indeed, my patch introduced a mistake that broke caching and hid the > > unimport problem. I have included a patch to deal with both of these. > > > > Thank you for spotting that silly (double) mistake of mine. > > > > ~Niels > > > >
> >
Reloading autodie into my brain now. I suspect I may be able to show the presence of a bug regarding leak guards, but it's easier for me to write the test case first, so I'm doing that. :)
It's almost 6am, so I apologise in advance for any lack of clarity in my post. :) So, the relevant test is in this commit (which passes under master, but fails in the cache_improvements branch): https://github.com/pjf/autodie/commit/ab2125219e4b4fbd9423469da460882c60d43f57 ``` When we install a leakguard, it includes the file that 'use autodie' was called in. If that leakguard gets cached, then installing the cached copy in a different file will result in autodie no longer working there. The leak guard spots a different filename, and immediately reverts back to using the CORE behaviour. ``` In a nutshell, assuming that Foo.pm and Bar.pm don't define their own packages: use autodie; use Bar; then inside Bar.pm # No package line here! open(...); # This would be autodying without a leak guard! # With a leak-guard, we realise we've leaked, and go back to CORE::open package Bar; open(...); # This would never be autodying (unless we 'use autodie' ourselves) Leak guards need to know the filename that they're supposed to be installed in, otherwise they can't work. At the moment, we're caching leak guards, and while they don't include a package, they *do* include a baked-in filename. I'm open to creative ideas on how to solve this. We might be able to stuff most of the leak guard into a compiled-once subroutine, and then just eval a mini-leakguard which contains the filename, and a `goto &code` for the actual leakguard. Except I suspect that will result in something horrible happening with prototypes, as currently our leakguards mimic the exact same prototypes as the core subroutines they wrap. We could maybe have a magic (global) variable somewhere, then our mini-leakguard with pre-baked filename stuffs the filename into that, and then magic 'goto's the pre-compiled leakguard, which looks at that magic variable to see which filename we should be inside. That's reasonably simple, but it still requires an eval for each mini-leakguard¹. The big gotcha there is it normally wouldn't be thread-safe, but maybe it is because perl threads don't share anything unless they're explcitly marked as shared. It's been a while since I've used perl threads for anything more than a page or two of code. Other suggestions welcome. Paul ¹ I have no idea how much of the cost of an eval is the *calling* of the eval, and how much is depdendent on how much parsing/compiling that eval has to do. If the calling overhead is relatively low, this is still a win.
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 23:03:51 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-10 21:58, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > It's almost 6am, so I apologise in advance for any lack of clarity in my post. :) > > So, the relevant test is in this commit (which passes under master, but fails in the cache_improvements branch): > > https://github.com/pjf/autodie/commit/ab2125219e4b4fbd9423469da460882c60d43f57 >
Thanks for the test. Show quoted text
> ``` > When we install a leakguard, it includes the file that 'use autodie' was called in. > If that leakguard gets cached, then installing the cached copy in a different file > will result in autodie no longer working there. The leak guard spots a different > filename, and immediately reverts back to using the CORE behaviour. > ``` > > In a nutshell, assuming that Foo.pm and Bar.pm don't define their own packages: > > use autodie; > use Bar; > > then inside Bar.pm > > # No package line here! > open(...); # This would be autodying without a leak guard! > # With a leak-guard, we realise we've leaked, and go back to CORE::open > > package Bar; > open(...); # This would never be autodying (unless we 'use autodie' ourselves) >
I think this example + the test helped me realize what the deal is with these annoying leak guards. It might be prudent to add (a reference to) an example like that lib/Fatal.pm. Show quoted text
> Leak guards need to know the filename that they're supposed to be installed in, otherwise they can't work. At the moment, we're caching leak guards, and while they don't include a package, they *do* include a baked-in filename. >
Would it be sufficient to know which filenames had an autodie wrapper installed (for a given sub/core function)? i.e. can we have a table more in Fatal.pm, which could replace the relevant part of the leak guard? Thus, replace goto &\$code if ((caller \$caller_level)[1] eq \$filename); with something like goto &\$code if (exists $table{$call}{(caller \$caller_level)[1]}; Show quoted text
> I'm open to creative ideas on how to solve this. We might be able to stuff most of the leak guard into a compiled-once subroutine, and then just eval a mini-leakguard which contains the filename, and a `goto &code` for the actual leakguard. Except I suspect that will result in something horrible happening with prototypes, as currently our leakguards mimic the exact same prototypes as the core subroutines they wrap. >
Even if we cannot, we should be able to avoid having to recompile the actual sub each time (i.e. only recompile the leak guard for each new usage). It should still be worth something Show quoted text
> We could maybe have a magic (global) variable somewhere, then our mini-leakguard with pre-baked filename stuffs the filename into that, and then magic 'goto's the pre-compiled leakguard, which looks at that magic variable to see which filename we should be inside. That's reasonably simple, but it still requires an eval for each mini-leakguard¹. The big gotcha there is it normally wouldn't be thread-safe, but maybe it is because perl threads don't share anything unless they're explcitly marked as shared. It's been a while since I've used perl threads for anything more than a page or two of code. >
Perhaps; I have never been good at perl threads (even less with "magic"), but if nothing else there is still the CLONE sub which may (or may not) come in handy for the threads. Show quoted text
> Other suggestions welcome. > > Paul > > ¹ I have no idea how much of the cost of an eval is the *calling* of the eval, and how much is depdendent on how much parsing/compiling that eval has to do. If the calling overhead is relatively low, this is still a win. >
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Fri, 10 May 2013 14:09:50 -0700
To: bug-autodie [...] rt.cpan.org
From: "Michael G. Schwern" <schwern [...] pobox.com>
I'm now seeing about a 15-20% improvement when loading autodie into multiple packages. Thanks!
Okay, this may be crazy, or brilliant. I'm not sure. Also, I've got another mail from you (Niels) in my inbox, andI haven't read that yet. I'm sending this anyway. :) Currently: * User code calls the leakguard, which has a real prototype, and checks the filename is correct. * If the filename isn't correct, we call the CORE version (or whatever we had in that spot before) * If the filename *is* correct, we jump (goto &) the autodying/fatalised version of the code. There's one leakguard for each filename + sub combination. However we *should* be able to do this: * User calls a micro-guard (with real proto) for each filename + sub combination `-> This just puts the filename into a known place, and jumps to the beefy guard. * Beefy guard exists for each sub (but *not* each file) and do the heavy lifting (but see below). The microguards effectively end up looking like: sub rename($$) { local $autodie::_GUARDFILE = 'yourscript.pl'; goto $autodie::beefy{rename}; } Right now, the beefy guards include an actual call to the core sub, which is *only* called if we leaked. It's *this* call that requires our package declaration. For reusable subs, we only need generate the beefy guard once (which Niels' patch essentially does), but wouldn't it be great if we could cache all the subs? In fact, wouldn't it be great if we could only build one beefy guard, and use it for everything? Maybe we can. How often do we *actually* leak? It's when someone loads another file (require/use) and that file doesn't include a package declaration (or declares itself in the same package). In well written code, that's not very often. So maybe, just maybe, we could have a single beefy guard. It looks in a known place (local() variables set by the microguard) for what filename it should be checking against, and also what (autodying) subroutine it should go jumping to if we're in the correct file. The microguard has the prototpe that we need to expose to the end user, and by using magic-goto we can preserve the arguments on the stack when we jump to our autodying sub. The autodying sub has the package information encapsulated into that, so the beefy sub doesn't need to know it. If that beefy guard discovers we *have* leaked, *and* we should be calling a core subroutine, then packages matter again. So we have it look in a third known location (maybe they all live inside a hash/array?) and then evaluates and jumps to the code there: my $sub = eval $some_magic_location; goto &$sub; Maybe there's some caching going on as well (we should be able to cache on package + subroutine). Advantages are: * Only one big guard needs to be eval'd. In fact, it might be possible to make it a static subroutine now. * Many less lines of eval'd code when calling autodie (all microguards). * A lot of leak-catching code may only get compiled if we actually hit a leak. (Most programs won't.) Bonus is that our microguard might be able to push/unshift the filename, autodying subroutine reference, and core/leak code onto @_, which removes the need for magic locations. Of coure, we'd have to pull them right off again at the start of the beefy guard.
On Fri May 10 17:04:10 2013, niels@thykier.net wrote: Show quoted text
> I think this example + the test helped me realize what the deal is > with > these annoying leak guards. It might be prudent to add (a reference > to) > an example like that lib/Fatal.pm.
I agree! :) (Although I'm about to crash into bed, so I may do this when I wake up.) Show quoted text
> Would it be sufficient to know which filenames had an autodie wrapper > installed (for a given sub/core function)? i.e. can we have a table > more in Fatal.pm, which could replace the relevant part of the leak > guard?
Yeah, I'm going to spoil this idea too: # main.pl use autodie; use Foo; # Foo.pm open(my $fh, '<', ...); # Leak guard makes sure this *isn't* autodying. use autodie; ... In this case, the main.pl autodying features can try to leak into Foo.pm. Foo.pm uses autodie *later on*, but the open() will look inside our file table at run-time, and sure enough, Foo.pm will be there, and it will end up being autodying. So unfortunately the leak guards need to know exactly while filename they're checking for. :/ Show quoted text
> Even if we cannot, we should be able to avoid having to recompile the > actual sub each time (i.e. only recompile the leak guard for each new > usage). It should still be worth something
I agree! (See my other mail for an idea on how to do this.) Paul (Sleeping soon, so apologies for any delays in followups)
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 09:04:38 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 00:39, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > Okay, this may be crazy, or brilliant. I'm not sure. Also, I've got another mail from you (Niels) in my inbox, andI haven't read that yet. I'm sending this anyway. :) > > Currently: > > * User code calls the leakguard, which has a real prototype, and checks the filename is correct. > * If the filename isn't correct, we call the CORE version (or whatever we had in that spot before) > * If the filename *is* correct, we jump (goto &) the autodying/fatalised version of the code. > > There's one leakguard for each filename + sub combination. > > However we *should* be able to do this: > > * User calls a micro-guard (with real proto) for each filename + sub combination > `-> This just puts the filename into a known place, and jumps to the beefy guard. > * Beefy guard exists for each sub (but *not* each file) and do the heavy lifting (but see below). > > The microguards effectively end up looking like: > > sub rename($$) { > local $autodie::_GUARDFILE = 'yourscript.pl'; > goto $autodie::beefy{rename}; > } >
If that is going to be the "per-file" leakguard, we wouldn't need an eval for that. use Scalar::Util qw(set_prototype); [...] my $leak_guard = sub { local $autodie::_GUARDFILE = $file; goto $autodie::beefy{$call}; }; return set_prototype($leak_guard, $real_prototype); So we can still take out some eval overhead here. But if the beefy leak guard becomes a real sub (and not a per-prototype guard), then don't have to use the "local $autodie::_GUARDFILE"-trick (instead just pass it as argument). You suggest that later and I think it is doable (see below). Show quoted text
> Right now, the beefy guards include an actual call to the core sub, which is *only* called if we leaked. It's *this* call that requires our package declaration. > For reusable subs, we only need generate the beefy guard once (which
Niels' patch essentially does), but wouldn't it be great if we could cache all the subs? Show quoted text
> In fact, wouldn't it be great if we could only build one beefy guard,
and use it for everything? Show quoted text
>
Should be doable. Assuming we don't leak, all leak guards appear to behave the same (namely, goto \&wrapper). On leak, we would have two cases; one for core subs and one for other subs. For "non-core" subs, we just need a subref to the original one and we can goto it. Fatal must keep a reference to it around, so that should be fairly straight forward. For "core" subs, we have to generate a function to call the core sub correctly (because to knowledge, we cannot get a reference to them nor goto to them). But we can generate these lazily and cache them. So if you never leak a CORE sub, Fatal will never generate the trampoline for it. We can always throw in more information into the micro-guard to make the "follow-up" easier (as you suggested in the end of your mail). Show quoted text
> Maybe we can. > > How often do we *actually* leak? It's when someone loads another file (require/use) and that file doesn't include a package declaration (or declares itself in the same package). > In well written code, that's not very often. So maybe, just maybe, we
could have a single beefy guard. It looks in a known place (local() variables set by the microguard) for Show quoted text
> what filename it should be checking against, and also what (autodying)
subroutine it should go jumping to if we're in the correct file. The microguard has the prototpe that we Show quoted text
> need to expose to the end user, and by using magic-goto we can
preserve the arguments on the stack when we jump to our autodying sub. The autodying sub has the package information Show quoted text
> encapsulated into that, so the beefy sub doesn't need to know it. >
I believe we still have to generate the call trampoline for (unchecked) CORE subs (see above). But indeed if we can go a goto \&CORE::sub that would be great. Show quoted text
> If that beefy guard discovers we *have* leaked, *and* we should be calling a core subroutine, then packages matter again. So we have it look in a third known location > (maybe they all live inside a hash/array?) and then evaluates and
jumps to the code there: Show quoted text
> > my $sub = eval $some_magic_location; > goto &$sub; >
Why would packages matter here? Ignoring globs for a moment, there is no value in compiling a sub into the package of the leaking callsite. As we "goto" rather than call the original subref, both the leak guard(s) will not be in the call stack. We can include a reference to the original subref in the micro-guard that exposes it to the "beefy" leak guard, so we don't even need a table-look up or anything. In fact, for user subs, we can create generic wrappers for user subs by including original subref. sub { my $orig = shift; my $on_error_info = shift; if (!wantarray) { my $scalar_ret; if (@_ == 1) { $scalar_ret = $orig->($_[0]); } elsif (...) { ... } if (!$scalar_ref) { [code-that-throws-and-uses-$on_error_info] } } else { ... } } Sure, it would have to be string eval'ed but we can pretty much re-use it for all usersubs having the same prototype (or in fact, even a "compatible-call prototype")[1]. In my experience, most user subs do not have a prototype at all so it would be a big win here if we can just make a generic handler for that case. For CORE subs, I think the package name only matters if we cannot qualify globs correctly. But sadly, we cannot generate generic checking wrapper functions for these (but we can lazily generate reusable ones for them). Show quoted text
> Maybe there's some caching going on as well (we should be able to cache on package + subroutine). > > Advantages are: > > * Only one big guard needs to be eval'd. In fact, it might be possible to make it a static subroutine now. > * Many less lines of eval'd code when calling autodie (all microguards). > * A lot of leak-catching code may only get compiled if we actually hit a leak. (Most programs won't.) >
\o/ Show quoted text
> Bonus is that our microguard might be able to push/unshift the filename, autodying subroutine reference, > and core/leak code onto @_, which removes the need for magic
locations. Of coure, we'd have to pull them Show quoted text
> right off again at the start of the beefy guard. >
That or pull it via the "local $autodie::_EXTRA_ARG"-trick either way should be doable. Admittedly, I would prefer using @_ as it seems "less magical". ~Niels [1] Example: sub a($) { ... } sub b(&) { ... } sub c(_) { ... } Those three subs will have the same "generated call" (see fill_protos). So we could do a: sub checked_X {...} that could handle any of those 3 subs (instead of creating 1 checking sub for each).
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 10:32:15 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 09:04, Niels Thykier wrote: Show quoted text
> On 2013-05-11 00:39, PJF via RT wrote:
>> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > >> >> Okay, this may be crazy, or brilliant. I'm not sure. Also, I've got another mail from you (Niels) in my inbox, andI haven't read that yet. I'm sending this anyway. :) >> >> Currently: >> >> * User code calls the leakguard, which has a real prototype, and checks the filename is correct. >> * If the filename isn't correct, we call the CORE version (or whatever we had in that spot before) >> * If the filename *is* correct, we jump (goto &) the autodying/fatalised version of the code. >> >> There's one leakguard for each filename + sub combination. >> >> However we *should* be able to do this: >> >> * User calls a micro-guard (with real proto) for each filename + sub combination >> `-> This just puts the filename into a known place, and jumps to the beefy guard. >> * Beefy guard exists for each sub (but *not* each file) and do the heavy lifting (but see below). >> >> The microguards effectively end up looking like: >> >> sub rename($$) { >> local $autodie::_GUARDFILE = 'yourscript.pl'; >> goto $autodie::beefy{rename}; >> } >>
> > If that is going to be the "per-file" leakguard, we wouldn't need an > eval for that. > > [...] > > So we can still take out some eval overhead here. > > But if the beefy leak guard becomes a real sub (and not a per-prototype > guard), then don't have to use the "local $autodie::_GUARDFILE"-trick > (instead just pass it as argument). You suggest that later and I think > it is doable (see below). >
Attached is a prototype patch that passes all tests (i.e. fixes the leak). Sadly it does come at a slight speed regression (still faster than the original code though). I am not entirely sure where this extra loss comes in... Also, we probably want to cache the trampolines for CORE subs so they are not eval'ed once per leak at runtime. (I admit being too lazy to do that). Show quoted text
> [...] > > For "core" subs, we have to generate a function to call the core sub > correctly (because to knowledge, we cannot get a reference to them nor > goto to them). But we can generate these lazily and cache them. So if > you never leak a CORE sub, Fatal will never generate the trampoline for it. > > [...]
We can actually do better than this. We can pass an extra argument to the checking wrapper functions for CORE subs that tells us whether we leaked or not. Using this extra argument, we can skip return value checking if we leaked. With this we have one string eval per CORE sub even if we leak. (At least for "resuable" CORE subs). ~Niels

Message body is not shown because sender requested not to inline it.

So, straight up, you're amazing. I'm loving your ideas. :) Also, I agree that using argument passing rather than magic variables is *way* nicer. So for the rest of this discussion I'll just assume that's what we're doing. :) On Sat May 11 03:04:57 2013, niels@thykier.net wrote: Show quoted text
> return set_prototype($leak_guard, $real_prototype); > > So we can still take out some eval overhead here.
Dude! Does that work? If so, AWESOME. :) Show quoted text
> Should be doable. Assuming we don't leak, all leak guards appear to > behave the same (namely, goto \&wrapper).
Confirming this is exactly correct. :) Show quoted text
> For "core" subs, we have to generate a function to call the core sub > correctly (because to knowledge, we cannot get a reference to them nor > goto to them). But we can generate these lazily and cache them. So > if you never leak a CORE sub, Fatal will never generate the trampoline > for it.
Also exactly correct to the limits of my knowledge. If we can find a way to magic-goto a core function, that would be amazing, but as far as I know we can't. As an added side-effect of generated-on-demand trampolines is that we can actually spot leaks happening at run-time. Which means if we wanted to (possibly as part of a future change) we could warn about them. Show quoted text
> Why would packages matter here? Ignoring globs for a moment, there is > no value in compiling a sub into the package of the leaking callsite.
Because of globs. :) I can't think of any built-ins that use package variables or formats that autodie will replace. So AFAIK it's just globs. (And I know I've got a patch from you to fully qualify globs, which may well let us do away with much package rubbish.) Show quoted text
> As we "goto" rather than call the original subref, both the leak > guard(s) will not be in the call stack. We can include a reference to > the original subref in the micro-guard that exposes it to the "beefy" > leak guard, so we don't even need a table-look up or anything.
+1! :) Show quoted text
> Sure, it would have to be string eval'ed but we can pretty much re-use > it for all usersubs having the same prototype (or in fact, even a > "compatible-call prototype")[1]. In my experience, most user subs do > not have a prototype at all so it would be a big win here if we can > just make a generic handler for that case.
That would be awesome. And I agree with being able to re-use or compatible-call prototypes. :) Show quoted text
> For CORE subs, I think the package name only matters if we cannot > qualify globs correctly. But sadly, we cannot generate generic > checking wrapper functions for these (but we can lazily generate reusable ones > for them).
And with well-behaved code that doesn't leak, that lazy generation is going to be a big win. :) Show quoted text
> That or pull it via the "local $autodie::_EXTRA_ARG"-trick either way > should be doable. Admittedly, I would prefer using @_ as it seems > "less magical".
I concur with using @_. autodie is already too magical as it is. :) So, roadmap from here: = Essential = 1) Write code that generates micro-handlers. These are closures which possess: (a) the desired prototype, (b) the filename we expect to be inside, (c) the original subroutine reference or core function name, (d) the autodying version of the same, (e) the package (for now). All microhandlers do is add information to @_ and then magic-goto our beefy handler. 2) Write the beefy handler. It removes the information from @_ added by the microhandler, checks to see if we're in the file we expect, and then either magic-gotos the autodying subroutine (typical case), the original subroutine (for leaked user subroutines), or does a trampoline dance (for leaked cores). = Bonus Points = 3) Reduce evals further by having only a single wrapper per compatible-prototype for wrapping user-subs. (Not in essential, because user subs get wrapped *way* less often than core functions.) 4) Reduce evals by fully qualifying globs (can potentially make all core wrappers reusable, as well as making trampolines more cacheable). 5) Any others? I'm sure I've missed something. = Gotchas = * There's a risk of a slower run-time, since we've now got an *extra* guard in place. Total cost is probably that of a push, a pop, and a goto. This is probably insignificant, but we should probably benchmark a bunch of calls anyway. = Crazy Bonus Idea = use autodie qw(:danger_mode); * Installs autodie without any leak guards at all. Requires that the author promises that all use(), require(), do(), etc calls are either non-existant, or well-behaved. * Extra bonus points: Come up with a good Klingon name for that pragma. I suggest 'batlh' (honour). * Super bonus points: Update Perl::Critic to detect poorly behaved code via static analysis when danger mode is on. :) * Super duper bonus points: Convince p5p that they really want a Klingon named pragma in core. :) Have I missed anything?
On Sat May 11 04:32:35 2013, niels@thykier.net wrote: Show quoted text
> Attached is a prototype patch that passes all tests (i.e. fixes the > leak). Sadly it does come at a slight speed regression (still faster > than the original code though). I am not entirely sure where this > extra loss comes in...
As in slower using your benchmark? Weird. I wonder if set_prototype() is particularly slow, because everything looks like it should take the same time, but with less evals. Going to apply this now. :) Out of curiosity, are you doing any of your work on github? I'm totally happy with patches, but I'm also happy with pull requests. :) Show quoted text
> We can actually do better than this. We can pass an extra argument to > the checking wrapper functions for CORE subs that tells us whether we > leaked or not. Using this extra argument, we can skip return value > checking if we leaked. With this we have one string eval per CORE sub > even if we leak. (At least for "resuable" CORE subs).
Have I mentioned how brilliant you are recently? Because that's a friggin' *awesome* idea. :) Although it might slow the run-time for typical cases, since we'll need an extra check for the flag every time, even in non-leaky code. (I'm totally happy to speed things up for the typical cases at the expense of leaking cases.) Paul
Patch applied and pushed to github. It certainly looks good from here. :) https://github.com/pjf/autodie/tree/cache_improvements I've removed truncate() and chdir() from the list of reusable subs, since they can both take filehandles and/or directory handles. If you think I've removed these by mistake, do let me know. :) Now going to run benchmarks. Paul
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 11:16:35 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 11:08, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > Patch applied and pushed to github. It certainly looks good from here. :) > > https://github.com/pjf/autodie/tree/cache_improvements > > I've removed truncate() and chdir() from the list of reusable subs, since they can both take filehandles and/or directory handles. If you think I've removed these by mistake, do let me know. :) > > Now going to run benchmarks. > > Paul >
We don't need the leak guard[1] for >= 5.10.X. The following patch demonstrates that (for the test suite) in_effect and the caller dance in _leak_guard produces the same result. ~Niels [1] http://perldoc.perl.org/perlpragma.html See the "in_effect"

Message body is not shown because sender requested not to inline it.

A few benchmarking notes. Vanilla 2.17 runs the benchmark in about 19s (real) b80dda1cb5b963c0311c8646ac584a258c81a6c3 (micro + beefy) runs the benchmark in about 13.7s (real) ab2125219e4b4fbd9423469da460882c60d43f57 (before micro + beefy) runs in ~ 11.75s (real) So yes, I can confirm the slowdown, but it's still a big improvement over vanilla. :)
Show quoted text
> We don't need the leak guard[1] for >= 5.10.X. The following patch > demonstrates that (for the test suite) in_effect and the caller dance > in _leak_guard produces the same result.
Oh my! Actually using %^H for its intended purpose? We've come full circle! The original autodie wrapped everything and then checked %^H each time a sub was called, since that was the Right Way To Do Things. But it had a penalty on the run-time performance of all built-ins, and it didn't work for 5.8.x (still very common at the time). Then someone (chocolateboy?) discovered you could stuff objects into %^H (which the docs say you should never ever do), and they'd be stringified at the end of lexical scope, triggering their destroy handlers, which allowed us to remove our hacks from the symbol table, meaning no run-time overhead for areas where autodie was out of scope. It worked in 5.8.x, and there's no way I wanted to maintain two ways of doing autodie, so it stuck around. Anyway, somehow I was supposed to segue into ruining people's fun, but I'm not sure how to get there. So I'll just do this thought experiment (I haven't tested it): use autodie qw(unlink); use Foo; # Inside Foo.pm # Right now, $^H{'autodie/active'} should be false. # But now it will be true! use autodie qw(link); unlink(...); # Leaks if we only depend upon %^H Of course, the way to protect against this is by having $^H{'autodie/CORE::open'}, $^H{'autodie/CORE::unlink'}, so turning on autodie for one thing doesn't accidentally result in it being turned on for everything else. I have no idea if that will be faster than what we're currently doing. :) Paul
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 11:42:03 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 11:16, Niels Thykier wrote: Show quoted text
> On 2013-05-11 11:08, PJF via RT wrote:
>> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > >> >> Patch applied and pushed to github. It certainly looks good from here. :) >> >> https://github.com/pjf/autodie/tree/cache_improvements >> >> I've removed truncate() and chdir() from the list of reusable subs, since they can both take filehandles and/or directory handles. If you think I've removed these by mistake, do let me know. :) >> >> Now going to run benchmarks. >> >> Paul >>
> > We don't need the leak guard[1] for >= 5.10.X. The following patch > demonstrates that (for the test suite) in_effect and the caller dance in > _leak_guard produces the same result. > > ~Niels > > > [1] http://perldoc.perl.org/perlpragma.html > > See the "in_effect" > >
Leak-guardless calls to usersubs, enjoy[1]! ~Niels [1] I implemented those because they were easiest. :)

Message body is not shown because sender requested not to inline it.

CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 11:51:13 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 11:40, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > >
>> We don't need the leak guard[1] for >= 5.10.X. The following patch >> demonstrates that (for the test suite) in_effect and the caller dance >> in _leak_guard produces the same result.
> > Oh my! Actually using %^H for its intended purpose? We've come full circle! > > The original autodie wrapped everything and then checked %^H each time a sub was called, since that was the Right Way To Do Things. > But it had a penalty on the run-time performance of all built-ins, and
it didn't work for 5.8.x (still very common at the time). Show quoted text
>
Hmm, now if only there had been a comment about that in lib/Fatal.pm. :P I wasn't aware of any perf issues with %^H (never used it before), so I admit I assumed it was a regular Perl hash. Maybe it has a more expensive look up than O(avg(1)) ? Show quoted text
> Then someone (chocolateboy?) discovered you could stuff objects into %^H (which the docs say you should never ever do), and they'd be stringified at the end of lexical scope, > triggering their destroy handlers, which allowed us to remove our
hacks from the symbol table, meaning no run-time overhead for areas where autodie was out of scope. Show quoted text
> It worked in 5.8.x, and there's no way I wanted to maintain two ways
of doing autodie, so it stuck around. Show quoted text
>
Personally, I would probably be afraid that Perl might change and break the behaviour of %^H in a future version. Show quoted text
> Anyway, somehow I was supposed to segue into ruining people's fun, but I'm not sure how to get there. So I'll just do this thought experiment (I haven't tested it): > > [...] > > Of course, the way to protect against this is by having $^H{'autodie/CORE::open'}, $^H{'autodie/CORE::unlink'}, so turning on autodie for one thing doesn't accidentally > result in it being turned on for everything else. >
Indeed, my original patch was a quick proof-of-concept. The patch I sent in a mail separate to this should include this. Also, unimported stuff should probable be "deleted" instead of just set to 0 (I forgot that in my patch, but oh well). Show quoted text
> I have no idea if that will be faster than what we're currently doing. :) > > Paul >
There is only one way of finding out. :) ~Niels
On Sat May 11 05:51:32 2013, niels@thykier.net wrote: Show quoted text
> I wasn't aware of any perf issues with %^H (never used it before), so > I admit I assumed it was a regular Perl hash. Maybe it has a more > expensive look up than O(avg(1)) ?
Oh! It was more a case of: { use autodie; # handful of lines that use autodie ... } # millions of lines of code that still: # * Have the expense of calling a subroutine. # * Have that subroutine check %^H to see if autodie is in effect. # * Don't work in Perl 5.8.x , which was deployed everywhere ;) AFAIK it's the same cost as any other hash look-up. From an archiecture standpoint, using %^H makes perfect sense for leak checking. One gotcha is that up until this point, everything in autodie works just great in 5.8.x, including everything in the cache_improvements branch right now. I'd like to still support 5.8.x if I can. (Although if any changes are 5.10+ only and result in big performance gains, I'm totally happy to reconsider that position.) Show quoted text
> Personally, I would probably be afraid that Perl might change and > break the behaviour of %^H in a future version.
There's a reason I then submitted autodie to be included in the Perl core. :) Bwahaha. Show quoted text
> Indeed, my original patch was a quick proof-of-concept. The patch I > sent in a mail separate to this should include this. Also, unimported > stuff should probable be "deleted" instead of just set to 0 (I forgot > that in my patch, but oh well).
At a glance, it looks good. :) For what it's worth, I'd love to see if we can make a new release with what we have right now. My crtieria for releasing software is simply "better than the last version", and what we have right now definitely qualifies (by about a 20-30% speed improvement on subsequent loads of autodie). This isn't to say that you shouldn't keep working on awesome new refactorings and optimisations, just that I want to squeeze what we've got properly into my head and to test it like crazy. I am really curious as to why the set_prototype() code is slower, though. A benchmark reveals that set_prototype is *waaaaay* faster than a string eval: use strict; use warnings; use Benchmark qw(:all); use Scalar::Util qw(set_prototype); cmpthese( -2, { eval => sub { my $x = 1; eval 'my $foo = sub ($$) { return $_[0] + $_[1] + $x; };'; }, proto => sub { my $x = 1; my $foo = set_prototype( sub { return $_[0] + $_[1] + $x; }, '$$'); } } ); __END__ Rate eval proto eval 56006/s -- -94% proto 862852/s 1441% -- I'm going to go find something for dinner and then come back for more investigation work. :) Paul
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 12:36:14 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 11:08, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > [...] > I've removed truncate() and chdir() from the list of reusable subs, since they can both take filehandles and/or directory handles. > If you think I've removed these by mistake, do let me know. :) > > [...] > > Paul >
In general, yes they do accept handles, but the autodie variant has the wrong prototype for that: """ # (with $Debug = 1; in lib/Fatal.pm) $ perl -I lib -Mautodie -e '' 2>&1 | \ grep -A2 -e main::truncate -e main::chdir # _make_fatal: sub=main::truncate pkg=main name=truncate void=0 sub ($$) { -- # _make_fatal: sub=main::chdir pkg=main name=chdir void=0 sub (;$) { """ And accordingly: """ $ perl -I lib -Mautodie=opendir -e 'opendir(FOO, "."); chdir(FOO);' $ perl -I lib -Mautodie=opendir,chdir \ -e 'opendir(FOO, "."); chdir(FOO);' Can't chdir('FOO'): No such file or directory at -e line 1 """ So either we need to rewrite the prototype for those subs or you can make the resuable again. :) ~Niels
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 14:15:42 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 12:32, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > On Sat May 11 05:51:32 2013, niels@thykier.net wrote: >
>> I wasn't aware of any perf issues with %^H (never used it before), so >> I admit I assumed it was a regular Perl hash. Maybe it has a more >> expensive look up than O(avg(1)) ?
> > Oh! It was more a case of: > > { > use autodie; # handful of lines that use autodie > ... > } > > # millions of lines of code that still: > # * Have the expense of calling a subroutine. > # * Have that subroutine check %^H to see if autodie is in effect.
But currently we have the expense of doing the while (caller) dance, so the question would be whether the while(caller) dance is faster than the %^H look up. Show quoted text
> # * Don't work in Perl 5.8.x , which was deployed everywhere ;) >
True, though with the leak guards no longer being "eval $str"'ed, maintaining both solutions might be more feasible (assuming the %^H approach actually turns out to be useful). Show quoted text
> AFAIK it's the same cost as any other hash look-up. > > From an archiecture standpoint, using %^H makes perfect sense for leak checking. > > One gotcha is that up until this point, everything in autodie works just great in 5.8.x, including everything in the cache_improvements branch right now. > I'd like to still support 5.8.x if I can. (Although if any changes are
5.10+ only and result in big performance gains, I'm totally happy to reconsider Show quoted text
> that position.) >
Atm., my experiments have not been successful (compile time performance regressed back to near 2.17 state). I guess, the extra pieces of code causes the "eval $code" to be equally more expensive. :-/ Show quoted text
>> Personally, I would probably be afraid that Perl might change and >> break the behaviour of %^H in a future version.
> > There's a reason I then submitted autodie to be included in the Perl core. :) Bwahaha. >
Ohohoho, a cunning move! Show quoted text
>> Indeed, my original patch was a quick proof-of-concept. The patch I >> sent in a mail separate to this should include this. Also, unimported >> stuff should probable be "deleted" instead of just set to 0 (I forgot >> that in my patch, but oh well).
> > At a glance, it looks good. :) > > For what it's worth, I'd love to see if we can make a new release with what we have right now. My crtieria for > releasing software is simply "better than the last version", and what
we have right now definitely qualifies Show quoted text
> (by about a 20-30% speed improvement on subsequent loads of autodie). >
Sounds like a good idea, but I made a silly mistake in the code (see below). Show quoted text
> This isn't to say that you shouldn't keep working on awesome new refactorings and optimisations, just that I > want to squeeze what we've got properly into my head and to test it
like crazy. Show quoted text
> > I am really curious as to why the set_prototype() code is slower, though. A benchmark reveals that set_prototype is *waaaaay* faster than a string eval: > > [...] >
Well, because "Yours Truly" is a klutz and failed to use the opportunities available. I wasn't using the lexical cache any longer, so they were recompiled each time. Please see attached patch. Show quoted text
> I'm going to go find something for dinner and then come back for more investigation work. :) > > Paul >
~Niels

Message body is not shown because sender requested not to inline it.

Show quoted text
> True, though with the leak guards no longer being "eval $str"'ed, > maintaining both solutions might be more feasible (assuming the %^H > approach actually turns out to be useful).
This is an excellent argument. We already have code that's *only* compiled if you're running in 5.10+. In fact, if you do: if (PERL510) { # Here's some modern code. } else { # Here's some less modern code. } then perl cheerfully optimises away the conditional and the branch that doesn't apply at compile-time. So there's very little penalty in having a faster 5.10 routine, and a slower 5.8 routine in the same file. Show quoted text
> Atm., my experiments have not been successful (compile time > performance regressed back to near 2.17 state). I guess, the extra > pieces of code causes the "eval $code" to be equally more expensive. > :-/
D'oh! Show quoted text
> Well, because "Yours Truly" is a klutz and failed to use the > opportunities available. I wasn't using the lexical cache any longer, > so they were recompiled each time. Please see attached patch.
Oh, sweet! Suspending my behavioural economics lectures to apply this now. :D Paul
10 seconds on my machine. That's the benchmark running almost twice as fast as what's on the cpan now! Awesome job! Thank you! :)
On Sat May 11 06:36:30 2013, niels@thykier.net wrote: Show quoted text
> In general, yes they do accept handles, but the autodie variant has > the wrong prototype for that:
Hooray! Show quoted text
> So either we need to rewrite the prototype for those subs or you can > make the resuable again. :)
I guess we should document somewhere that they've got mor restrictive prototypes if we haven't already done so. (AFAIK nobody's ever complained about this.) I'll go add them back in as reusable. :)
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 16:33:33 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 14:55, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > 10 seconds on my machine. That's the benchmark running almost twice as fast as what's on the cpan now! > > Awesome job! Thank you! :) >
You are welcome. :) If you want to do a new release for this, now might be a good time. Without adding support for qualifying globs, we might be able to reduce the compile-time overhead a bit more by refactoring a tiny bit. That said, I think the next big (compile-time) improvement will be with qualifying the globs (and thus adding more subs to the reusable_builtins cache). In the mean time, I managed to get the "leak-guardless" approach to work again (unsurprisingly, I was not using my cache ... this is becoming a re-occuring problem >.>). In load-time performance I see no difference between the two approaches now. That said, I haven't looked at optimizing out all the (under 5.10) redundant code (like the scope guards)[1]. ~Niels [1] Though profiling suggest it is not going to be a big win...
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sat, 11 May 2013 17:35:26 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-11 16:33, Niels Thykier wrote: Show quoted text
> [...] > > If you want to do a new release for this, now might be a good time. >
Oh yeah, it might be prudent to mention that leaked subs now have a "per call" overhead, since the we never made the trampoline cache. Alternatively, we should look at adding that cache before you release 2.18. Show quoted text
> Without adding support for qualifying globs, we might be able to reduce > the compile-time overhead a bit more by refactoring a tiny bit. That > said, I think the next big (compile-time) improvement will be with > qualifying the globs (and thus adding more subs to the reusable_builtins > cache). > > [...]
I pushed some minor changes to github[1]. I added a patch (at [2]) for skipping the "code-generation" step when we have a cached sub. It takes off a 1s on my benchmark.pl test (i.e. 11.5s -> 10.5s). Though I doubt the average user will notice this improvement. :) ~Niels [1] https://github.com/nthykier/autodie/commits/refactor [2] https://github.com/nthykier/autodie/commit/538f886e424de6a9d229463fa7bbb94d30043b72
On Sat May 11 11:35:48 2013, niels@thykier.net wrote: Show quoted text
> Alternatively, we should look at adding that cache before you release > 2.18.
Yeah, we should cache them before 2.18. Otherwise we're generating a whole subroutine and eval'ing it every time there's a leak, which seems excessive. Luckily they should be easy to cache. Package + call seems to be all that's required. (With potential for package + call + reuseable check if we want.) Show quoted text
> I pushed some minor changes to github[1]. I added a patch (at [2]) > for skipping the "code-generation" step when we have a cached sub. > It takes off a 1s on my benchmark.pl test (i.e. 11.5s -> 10.5s). > Though I doubt the average user will notice this improvement. :)
Awesome! Your code for skipping code-generation when all we need is a new leak guard looks great, and I've merged it into cache_improvements. Many thanks! Paul
Using benchmarks/leak.pl (which is now in cache_improvements): Vanilla autodie: 0.6s Before trampoline cache: 8.3s After trampoline cache: 0.4s :D https://github.com/pjf/autodie/commit/b94f36b913c4e271ee260cd7a434bfa3a4562c16
Although regular calls (measured with benchmarks/call.pl) now seem to be slower (6s -> 8s on my machine). This could be a side-effect of: * Adding and removing arguments to @_ * Having an extra layer of subroutines to bounce through (microguard + beefy guard, as opposed to all-in-one) * Something else. Let's see if I can profile this.
On Sun May 12 00:16:47 2013, PJF wrote: Show quoted text
> * Having an extra layer of subroutines to bounce through (microguard + > beefy guard, as opposed to all-in-one)
And it's this. Apparently goto & is expensive (although presumably faster than actually calling a sub), and it appears to be taking up the extra time. I've managed to get rid of a call to caller(), giving us back 0.5s. Still, autodying calls are about 25% slower than they used to be. Having said that, I can still make 135,000 calls/sec on my laptop (as opposed to 161,000/sec), so I suspect they're still fast enough.
Also, if you're doing something a bazillion times and need every ounce of speed, one can always replace the open in for (1..ONE_BILLION_SYSCALLS) { open(...); ...; } with `CORE::open` and do the check manually. Autodie is abot a squillion times slower than calling the built-in directly, but it's still so fast that nobody cares. So right now I'm not stressed with the fact that calls are a little slower. I'm *delighted* that loading times are so much faster, since that's what everyone complains about.
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sun, 12 May 2013 09:44:01 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-12 06:55, PJF via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=46984 > > > On Sun May 12 00:16:47 2013, PJF wrote: >
>> * Having an extra layer of subroutines to bounce through (microguard + >> beefy guard, as opposed to all-in-one)
> > And it's this. Apparently goto & is expensive (although presumably faster than actually calling a sub), and it appears to be taking up the extra time. >
Hi, NYTProf suggests that the 3 most expensive things (exclusive call time) are: 10.9s _leak_guard 6.91s <ANON sub - the checking wrapper> 3.25s <ANON sub - micro leakguard> In the checking wrapper, the three most expensive lines are (time on line): 1.32s local($", $!) = (', ', 0); # TODO - Why do we do this? 2.97s my $retval = CORE::binmode($_[0]); 2.91s return $retval || $die; (listed in order of appearance, i.e. not sorted by cost). Having looked at the resulting wrapper code, I have no idea why $" needs to be localized or altered. (Not sure with $! either, but at least it appeared to be used in the sub unlike $"). For _leak_guard, we spent time on 6.15s while ( ($caller = (caller $caller_level)[1]) =~ m{[...]} ) { 3.46s goto $wrapped_sub; For the while-loop, 549ms is claimed to be spent in the regex. The rest of the time is unaccounted for. Finally for the micro leak guard we spent: 1.53s unshift @_, [$filename, $code, $sref, $call, \@protos, $pkg]; 3.18s goto \&_leak_guard; By the looks of it, a leak-guardless approach would save us from 6 seconds of goto and 6 seconds on the while loop. That accounts for about 12s out of 28s during my profile, but it remains to be seen if the leak-guardless approach has a comparable overhead on its own. Show quoted text
> I've managed to get rid of a call to caller(), giving us back 0.5s. Still, autodying calls are about 25% slower than they used to be. >
Great :) FTR, my profile data from above includes that commit. Show quoted text
> Having said that, I can still make 135,000 calls/sec on my laptop (as opposed to 161,000/sec), so I suspect they're still fast enough. >
I am bit surprised to see that - I'd expect the difference to be higher. For me benchmarks/call.pl goes from 7.6s (with autodie) to 0.1-0.2s (without autodie). ~Niels
So, I can't find any reason not to release what we've got. I've been over the code lots of times. I've written new tests, which still pass. Everything runs pretty snappily. So I'm in the process of doing the last bits and pieces to push a new release. :)
Show quoted text
> 6.15s while ( ($caller = (caller $caller_level)[1]) =~ m{[...]} ) {
Show quoted text
> For the while-loop, 549ms is claimed to be spent in the regex. The > rest of the time is unaccounted for.
Pretty sure a bunch of it is from caller(). That's why I managed to find an easy time improvement by removing one call to it. :) Show quoted text
> > Having said that, I can still make 135,000 calls/sec on my laptop > > (as opposed to 161,000/sec), so I suspect they're still fast enough. > >
> > I am bit surprised to see that - I'd expect the difference to be higher. For > me benchmarks/call.pl goes from 7.6s (with autodie) to 0.1-0.2s (without > autodie).
Oh! That's from "new" autodie, from "old" autodie. Remove autodie entirely and I can make six bajillion calls per second. :) Which is why I'm not concerned about the modest speed decrease. If you're *really* needing the extra speed, then using CORE::whatever and your own checking is the way to go in the places where it really counts.
CC: mschwern [...] cpan.org
Subject: Re: [rt.cpan.org #46984] Speed up Fatal by reducing calls to eval?
Date: Sun, 12 May 2013 10:44:42 +0200
To: bug-autodie [...] rt.cpan.org
From: Niels Thykier <niels [...] thykier.net>
On 2013-05-12 09:44, Niels Thykier wrote: Show quoted text
> 1.32s local($", $!) = (', ', 0); # TODO - Why do we do this? > > [...] > > (listed in order of appearance, i.e. not sorted by cost). > > Having looked at the resulting wrapper code, I have no idea why $" needs > to be localized or altered. (Not sure with $! either, but at least it > appeared to be used in the sub unlike $").
Removing the $" bit causes t/backcompat.t to fail, so I guess we rely on it for the non-exception based way of dying. ~Niels
Hi, With the improvements that were included in v2.18 and v2.22, I believe we have taken out the majority of the overhead. In particular, we are looking at a factor 10 improvement in v2.17 vs v2.22: v2.17: 0m24.996s v2.18: 0m12.218s v2.22: 0m2.144s At this point, I doubt there is much more we can do without involving XS code. Using XS subs is currently tracked in github issue #32[1]. With that, I am taking the liberty of marking this issue as resolved in v2.22. ~Niels [1] https://github.com/pjf/autodie/issues/32