Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 16952
Status: open
Priority: 0/
Queue: PPI

People
Owner: adamk [...] cpan.org
Requestors: nospam-abuse [...] bloodgate.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



From: Tels <nospam-abuse [...] bloodgate.com>
To: bug-PPI [...] rt.cpan.org
Subject: [PATCH] Speed up tokenizer char-by-char
Date: Sat, 7 Jan 2006 13:24:36 +0100
-----BEGIN PGP SIGNED MESSAGE----- Moin, profiling PPI showed that for lines that are not recognized completely, the line is processed char-by-char. Unfortunately, this happened in an empty while loop by calling a subroutine for each character. :) The attached patch moves the loop inside the subroutine, allowing us to bypass the calls, the empty while body as well as the repeated checks for the valid cursor pos. I also eliminated duplicate code inside the loop. The patch also fixes a bug as a side-effect, the process_next_char() routine did not localize $_. I have not attempted to add a test for that, though. The speedup is a few percent, which highly depends on how many times lines need to be processed char-by-char and how long they are. Example parsing Graph::Easy.pm 5 times (to avoid start-up overhead skewing the results, the results are still skwed by the DESTROY e.g. the parsing is speed up more than shown here): Lowest from three runs: te@linux:~/perl/PPI> time perl d.pl real 0m5.376s user 0m5.288s sys 0m0.066s te@linux:~/perl/PPI> time perl -IPPI-1.109.e/lib/ d.pl real 0m5.181s user 0m5.110s sys 0m0.054s On this particular data, PPI is now about 3..4% faster. All tests still pass. Also attached are two profile runs. The .pm file 2489 lines, the test parses 12490 lines in 5.18 seconds, making PPI parsing about 2400 lines/s on my 2.0 Ghz AMD Athlon. Not bad :) Further ideas are to: * recognize more things entirely, so char-by-char overhead is reduced * less subroutines (to concentrate code hot spots) * find out what calls __ANON__ (which smells like something is triggering an overload, needlessly) Hope you like this work, Tels - -- Signed on Sat Jan 7 12:51:00 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. If you are bald, and comb some of your hair over the bald spot, you are violating US Patent #4,022,227: <http://tinyurl.com/6qxl7> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQ7+zBHcLPEOTuEwVAQHFrQf+Oxf3qwPF+OPi8ZMgPSa+h2oTCbS11B2Y IivSO3SuOp3okljWi8eEmLEdJa1tVYIw+kcXp+7/TUhS8XOKhy1LPHUAV6fKUbHE MtXu+EJ5/zYk3Xh2GwWRuK7IG7KiggnoteuonjGwVW2Ry5mMn+9wxAoN9bjRo/cf Jo6JKVdKss/Asq2yFL4p66YiK6FxPcohq8EhEkBUFYNoGaBzMxemNDDcha5Zhg/s pbc1u2tJjtxJU/tyR0T112i4Ay+H7gyuag3ah4j97Ltjasd7qPOMEiZ3TvTmws8A U1dDiH9tQzPRIbIAJAi9py2yEf4dBALn1bp13OppbYlm+vErUYwxgw== =LRYP -----END PGP SIGNATURE-----

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

On Sat Jan 07 07:18:10 2006, nospam-abuse@bloodgate.com wrote: Show quoted text
> The patch also fixes a bug as a side-effect, the process_next_char() > routine did not localize $_. I have not attempted to add a test for that, > though.
This part of the patch is fixed in SVN revision 1052. -- Chris
Subject: Re: [rt.cpan.org #16952] [PATCH] Speed up tokenizer char-by-char
Date: Fri, 22 Sep 2006 00:22:32 +0200
To: bug-PPI [...] rt.cpan.org
From: Tels <nospam-abuse [...] bloodgate.com>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Moin, On Thursday 21 September 2006 17:19, via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=16952 > > > On Sat Jan 07 07:18:10 2006, nospam-abuse@bloodgate.com wrote:
> > The patch also fixes a bug as a side-effect, the process_next_char() > > routine did not localize $_. I have not attempted to add a test for > > that, though.
> > This part of the patch is fixed in SVN revision 1052. > -- Chris
Heya Chris, nice to meet you :-) So did my patch get applied? best wishes from holiday, Tels - -- Signed on Fri Sep 22 00:22:02 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Den wahren Wert dieser Software werden vermutlich nur Fach Läute und Firmen erkennen." -- "So isst es. Ein gewißer Standart muss schon gewart beiben!" -- Kabe (http://tinyurl.com/3kucx) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iQEVAwUBRRMQqHcLPEOTuEwVAQLH4Af/QYopP/QJ9mOBk3wrSUrGlk7TZUnPKOta GYwou4U0IeUmy7OM7lTjbB86LmkqejxGqdRfD0OcNIfBHknjWEx2RjIrYyxs/kRf aM8XqdkyMQHekw3DRTFn3HSOawhdoVLa+Tk7nxZQQ0xCKnKLKtUDEKLo5Vd0CkG3 ajFpssLZt7shRSSQFjV4fdUd0al9Pw32gqFJElPyduInHFloa+XCrLFa7eF60k8B u0yCCK+toPasHgSV73GQIuGI8HmYf7OQsZ4Ih3xuNMgS3ZE4FacgEgR4oUxYVRYg pDuZc2eBklzPIWXXmXoKwHnxP3e1v3uwXhc8XxeryQIk09px93dcGQ== =vRHr -----END PGP SIGNATURE-----