Skip Menu |

This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id: 67785
Status: open
Priority: 0/
Queue: Regexp-Grammars

People
Owner: Nobody in particular
Requestors: gruzjs.dan [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.012
Fixed in: (no value)



Subject: how to stop parsing after <t> seconds
Dear Damian, It would be great if there is an easy way to stop parsing after $t seconds. When the grammar is complex (many backtracking to try) and an input which does not obey the grammar, the parsing takes a long time, and the user may want to stop it (or else the computer gets stuck). I tried to use $SIG{ALRM} mechanism, but it often crashes the system. If it is not a difficult, then it would be great to add a clean return once $t seconds pass. Thank you, -Dan G.
Subject: Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date: Mon, 2 May 2011 12:16:43 +1000
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Hi Dan, Show quoted text
> If it is not a difficult, then it would be great to add a clean return > once $t seconds pass.
I agree that it would be an excellent idea. Unfortunately, I'm not aware of any safe and reliable way to achieve that. :-( Damian
From: cpan [...] bbkr.org
I simply use ALRM within eval: ------------------------------------------------------ sub parse { my $content = shift; local $SIG{'ALRM'} = sub { die 'Parse timeout' }; alarm 4; my $parsed; eval { $parsed = $/{'TOP'} if $content =~ $DC; alarm 0; }; return $parsed; } ------------------------------------------------------ And it works fine.
From: cpan [...] bbkr.org
I was wrong, this is not the way to go... I've testes my example with really huge data and got randomly: * Segmentation fault * Modification of a read-only value attempted * something complaining about regexp stack I'm thinking of workaround (pseudocode): 1. set "$parsing_started = time" variable somewhere 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which are often trackbacked to get 4s timeout This should cause grammar to fail on regexp level instead of being brutally interrupted by ALRM signal. I'll get back to you with results...
Subject: Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date: Mon, 27 Jun 2011 18:07:31 +0300
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Dan Gruzjs <gruzjs.dan [...] gmail.com>
Hi Pawel, Thank you for responding to my question. Actually, Damian (the guy who wrote Grammar/Regexp.pm) helped me a bit by adding "timeout:" directive to the grammar. He sent me a beta version to check and it seems ok, but I don't know if he already published it online. If this is useful for you, then you can either check online or contact Damian directly to get the beta. Thanks, -Dan. On Mon, Jun 27, 2011 at 5:58 PM, Pawel Pabian via RT < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67785 > > > I was wrong, this is not the way to go... > > I've testes my example with really huge data and got randomly: > * Segmentation fault > * Modification of a read-only value attempted > * something complaining about regexp stack > > I'm thinking of workaround (pseudocode): > > 1. set "$parsing_started = time" variable somewhere > 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which > are often trackbacked to get 4s timeout > > This should cause grammar to fail on regexp level instead of being > brutally interrupted by ALRM signal. > > I'll get back to you with results... > > > >
Subject: Re: [rt.cpan.org #67785] how to stop parsing after <t> seconds
Date: Tue, 28 Jun 2011 10:00:11 +1000
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Pawel wrote: Show quoted text
> I was wrong, this is not the way to go...
Yes. I had found that out the hard way too. :-( Show quoted text
> I'm thinking of workaround (pseudocode): > > 1. set "$parsing_started = time" variable somewhere > 2. use "<require: (?{ now - $parsing_started > 4 })>" in tokens which > are often trackbacked to get 4s timeout
This is exactly what the new <timeout:...> directive does. Damian