Bug #67785 for Regexp-Grammars: how to stop parsing after <t> seconds

Wed Apr 27 02:35:06 2011 gruzjs.dan [...] gmail.com - Ticket created

Subject:

how to stop parsing after <t> seconds

Dear Damian, It would be great if there is an easy way to stop parsing after $t seconds. When the grammar is complex (many backtracking to try) and an input which does not obey the grammar, the parsing takes a long time, and the user may want to stop it (or else the computer gets stuck). I tried to use $SIG{ALRM} mechanism, but it often crashes the system. If it is not a difficult, then it would be great to add a clean return once $t seconds pass. Thank you, -Dan G.

Sun May 01 22:17:32 2011 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date:	Mon, 2 May 2011 12:16:43 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Hi Dan, Show quoted text

> If it is not a difficult, then it would be great to add a clean return > once $t seconds pass.

I agree that it would be an excellent idea. Unfortunately, I'm not aware of any safe and reliable way to achieve that. :-( Damian

Sun May 01 22:17:33 2011 The RT System itself - Status changed from 'new' to 'open'

Mon Jun 27 09:26:15 2011 cpan [...] bbkr.org - Correspondence added

From:

cpan [...] bbkr.org

I simply use ALRM within eval: ------------------------------------------------------ sub parse { my $content = shift; local $SIG{'ALRM'} = sub { die 'Parse timeout' }; alarm 4; my $parsed; eval { $parsed = $/{'TOP'} if $content =~ $DC; alarm 0; }; return $parsed; } ------------------------------------------------------ And it works fine.

Mon Jun 27 10:58:11 2011 cpan [...] bbkr.org - Correspondence added

From:

cpan [...] bbkr.org

I was wrong, this is not the way to go... I've testes my example with really huge data and got randomly: * Segmentation fault * Modification of a read-only value attempted * something complaining about regexp stack I'm thinking of workaround (pseudocode): 1. set "$parsing_started = time" variable somewhere 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which are often trackbacked to get 4s timeout This should cause grammar to fail on regexp level instead of being brutally interrupted by ALRM signal. I'll get back to you with results...

Tue Jun 28 13:52:46 2011 gruzjs.dan [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date:	Mon, 27 Jun 2011 18:07:31 +0300
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Dan Gruzjs <gruzjs.dan [...] gmail.com>

Hi Pawel, Thank you for responding to my question. Actually, Damian (the guy who wrote Grammar/Regexp.pm) helped me a bit by adding "timeout:" directive to the grammar. He sent me a beta version to check and it seems ok, but I don't know if he already published it online. If this is useful for you, then you can either check online or contact Damian directly to get the beta. Thanks, -Dan. On Mon, Jun 27, 2011 at 5:58 PM, Pawel Pabian via RT < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=67785 > > > I was wrong, this is not the way to go... > > I've testes my example with really huge data and got randomly: > * Segmentation fault > * Modification of a read-only value attempted > * something complaining about regexp stack > > I'm thinking of workaround (pseudocode): > > 1. set "$parsing_started = time" variable somewhere > 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which > are often trackbacked to get 4s timeout > > This should cause grammar to fail on regexp level instead of being > brutally interrupted by ALRM signal. > > I'll get back to you with results... > > > >

Tue Jun 28 13:53:01 2011 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date:	Tue, 28 Jun 2011 10:00:11 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Pawel wrote: Show quoted text

> I was wrong, this is not the way to go...

Yes. I had found that out the hard way too. :-( Show quoted text

> I'm thinking of workaround (pseudocode): > > 1. set "$parsing_started = time" variable somewhere > 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which > are often trackbacked to get 4s timeout

This is exactly what the new <timeout:...> directive does. Damian