Bug #125105 for Regexp-Grammars: Segmentation fault at 2 GB of memeory

Tue Apr 17 16:34:13 2018 alexchandel [...] gmail.com - Ticket created

Subject:

Segmentation fault at 2 GB of memeory

Parsing a large file (about 317 lines, about 19670 bytes) results in a segmentation fault as soon as Perl hits ~2 GB of memory usage. It's hard to tell, but memory allocation seemed to accelerate the further along the parser got in the file. Is there any way to force Regexp::Grammars to treat a list-like subrule as non-backtracking or atomic, like "<[statements=block_stmt]>*+" instead of "<[statements=block_stmt]>*"? Wouldn't that save on memory if Regexp::Grammars knew to fail instead of saving backtracking locations? Tested in Perl 5.26.1, macOS 10.13.4, Regexp::Grammars 1.048.

Tue Apr 17 19:49:54 2018 alexchandel [...] gmail.com - Correspondence added

On Tue Apr 17 16:34:13 2018, alexchandel@gmail.com wrote: Show quoted text

> Parsing a large file (about 317 lines, about 19670 bytes) results in a > segmentation fault as soon as Perl hits ~2 GB of memory usage. It's > hard to tell, but memory allocation seemed to accelerate the further > along the parser got in the file. > > Is there any way to force Regexp::Grammars to treat a list-like > subrule as non-backtracking or atomic, like > "<[statements=block_stmt]>*+" instead of "<[statements=block_stmt]>*"? > Wouldn't that save on memory if Regexp::Grammars knew to fail instead > of saving backtracking locations? > > Tested in Perl 5.26.1, macOS 10.13.4, Regexp::Grammars 1.048.

Note that this happens *even with* <nocontext:>, plus the liberal use of possessive & atomic expressions in tokens to limit backtracking at the small scale.

Wed Apr 18 08:34:14 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 22:33:22 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Hi Alex, I agree that it would certainly be better if non-backtracking repetitions could be applied to subrule calls. It would indeed reduce the parser's memory footprint in some cases. Unfortunately, I am not aware of any way to make something like <subrule>*+ work correctly. There is a long-standing issue with how in-regex variable localizations are implemented which means that they do not unwind correctly when a call to an independent subpattern is part of a non-backtracking repetition. And because R::G uses localized variables to implement its parse-tree stack, this issue makes it impossible to implement <subrule>*+ properly. I raised this issue with P5P several years ago but have been unable to convince them to change the current behaviour in such a way as to ensure that repeated localizations unwind correctly. I certainly understand your frustration, and I am very sorry that I have no better option to offer you. :-( Damian

Wed Apr 18 08:34:15 2018 The RT System itself - Status changed from 'new' to 'open'

Wed Apr 18 09:16:34 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 15:16:19 +0200
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On 18 April 2018 at 14:34, damian@conway.org via RT <bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Hi Alex, > > I agree that it would certainly be better if non-backtracking repetitions > could be > applied to subrule calls. It would indeed reduce the parser's memory > footprint > in some cases. > > Unfortunately, I am not aware of any way to make something like <subrule>*+ > work correctly. > > There is a long-standing issue with how in-regex variable localizations are > implemented > which means that they do not unwind correctly when a call to an independent > subpattern is part of a non-backtracking repetition. And because R::G uses > localized variables to implement its parse-tree stack, this issue makes it > impossible to implement <subrule>*+ properly. > > I raised this issue with P5P several years ago but have been unable to > convince > them to change the current behaviour in such a way as to ensure that > repeated > localizations unwind correctly.

Can you point me at any of this discussion? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

Wed Apr 18 09:36:44 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 23:34:32 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Hi Yves, Show quoted text

> Can you point me at any of this discussion?

Most of it was in person between Rik Signes (who was Pumpking at the time) and myself. The formal report was much more recent, via Hugo van der Sanden: https://rt.perl.org/Public/Bug/Display.html?id=132277 Damian

Wed Apr 18 09:45:33 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 13:44:47 +0000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

This is the first I have heard of this. Perhaps there is something we can do. Yves On Wed, 18 Apr 2018, 15:37 damian@conway.org via RT, < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Hi Yves, >

> > Can you point me at any of this discussion?

> > Most of it was in person between Rik Signes (who was Pumpking at the time) > and myself. > > The formal report was much more recent, via Hugo van der Sanden: > https://rt.perl.org/Public/Bug/Display.html?id=132277 > > Damian >

Wed Apr 18 09:49:36 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 23:48:45 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Show quoted text

> This is the first I have heard of this. > Perhaps there is something we can do.

It would be awesome if there were. Hugo’s sample code demonstrates one manifestation of the problem, but I’ll be happy to provide further examples of the issue if you need them. Thanks, Yves. Damian

Wed Apr 18 11:30:45 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 15:30:15 +0000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

Yes some tests would be very helpful On Wed, 18 Apr 2018, 15:49 damian@conway.org via RT, < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > >

> > This is the first I have heard of this. > > Perhaps there is something we can do.

> > It would be awesome if there were. > > Hugo’s sample code demonstrates one manifestation > of the problem, but I’ll be happy to provide further examples > of the issue if you need them. > > Thanks, Yves. > > Damian >

Wed Apr 18 12:24:40 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Thu, 19 Apr 2018 02:23:47 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Yves, Here is the simplest test suite I could devise that demonstrates the problem. In my testing, it fails identically under every release from 5.10 to 5.26 Damian -----cut----------cut----------cut----------cut----------cut----------cut----------cut----- #! /usr/bin/env perl use warnings; # # The pattern matching code in each pair of subtests is identical, # except that the second subtest replaces a backtracking quantifier # with its non-backtracking equivalent. # # In each case, the non-backtracking version should pass, but it fails # use Test::More; plan tests => 12; our $count; ################ subtest 'Backtracking *' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A)* # This quantifier is backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking *+' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A)*+ # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################ subtest 'Backtracking (?:*)' => sub { $count = 0; ok "aaa" =~ m{ \A (?: (?&A)* ) # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking (?>*)' => sub { $count = 0; ok "aaa" =~ m{ \A (?> (?&A)* ) # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################ subtest 'Backtracking +' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A)+ # This quantifier is backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking ++' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A)++ # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################ subtest 'Backtracking {3}' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A){3} # This quantifier is backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking {3}+' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A){3}+ # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################ subtest 'Backtracking {1,3}' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A){1,3} # This quantifier is backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking {1,3}+' => sub { $count = 0; ok "aaa" =~ m{ \A (?&A){1,3}+ # This quantifier is non-backtracking \z (?{ is $count, 3 => "Found 3 a's" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################ subtest 'Backtracking ?' => sub { $count = 0; ok "a" =~ m{ \A (?&A)? # This quantifier is backtracking \z (?{ is $count, 1 => "Found 1 a" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; subtest 'Non-backtracking ?+' => sub { $count = 0; ok "a" =~ m{ \A (?&A)?+ # This quantifier is non-backtracking \z (?{ is $count, 1 => "Found 1 a" }) (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) ) ) }x => 'Regex matched'; }; ################

Wed Apr 18 12:50:58 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 18 Apr 2018 18:50:25 +0200
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On 18 April 2018 at 18:24, damian@conway.org via RT <bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Yves, > > Here is the simplest test suite I could devise that demonstrates > the problem. In my testing, it fails identically under every release > from 5.10 to 5.26

That is awesome. Thanks Damian. Sorry for the terse replies earlier, was on a dratted phone. Hope you well! Yves Show quoted text

> > -----cut----------cut----------cut----------cut----------cut----------cut----------cut----- > > > #! /usr/bin/env perl > use warnings; > > # > # The pattern matching code in each pair of subtests is identical, > # except that the second subtest replaces a backtracking quantifier > # with its non-backtracking equivalent. > # > # In each case, the non-backtracking version should pass, but it fails > # > > use Test::More; > plan tests => 12; > > our $count; > > ################ > > subtest 'Backtracking *' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)* # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking *+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)*+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking (?:*)' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?: (?&A)* ) # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking (?>*)' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?> (?&A)* ) # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking +' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)+ # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking ++' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A)++ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking {3}' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){3} # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking {3}+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){3}+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking {1,3}' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){1,3} # This quantifier is backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking {1,3}+' => sub { > $count = 0; > ok "aaa" =~ m{ > \A > (?&A){1,3}+ # This quantifier is non-backtracking > \z > (?{ is $count, 3 => "Found 3 a's" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################ > > subtest 'Backtracking ?' => sub { > $count = 0; > ok "a" =~ m{ > \A > (?&A)? # This quantifier is backtracking > \z > (?{ is $count, 1 => "Found 1 a" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > subtest 'Non-backtracking ?+' => sub { > $count = 0; > ok "a" =~ m{ > \A > (?&A)?+ # This quantifier is non-backtracking > \z > (?{ is $count, 1 => "Found 1 a" }) > > (?(DEFINE) (?<A> a (?{ local $count = $count + 1 }) > ) ) > }x => 'Regex matched'; > }; > > ################

-- perl -Mre=debug -e "/just|another|perl|hacker/"

Wed Apr 18 14:22:44 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Thu, 19 Apr 2018 04:21:52 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Show quoted text

> Sorry for the terse replies earlier, was on a dratted phone.

No problem at all. Thanks for your interest in the problem! Show quoted text

> Hope you well!

Very well. Hope you are likewise. :-) Damian

Tue Apr 24 17:19:30 2018 alexchandel [...] gmail.com - Correspondence added

On Wed Apr 18 14:22:44 2018, damian@conway.org wrote: Show quoted text

> > Sorry for the terse replies earlier, was on a dratted phone.

> > No problem at all. Thanks for your interest in the problem! > >

> > Hope you well!

> > Very well. Hope you are likewise. :-) > > Damian

Possessive quantifiers and atomic grouping would be nice. But in the mean time, is there any way I can cut down on Regexp::Grammars' explosive memory usage? The grammar that triggers this has binary expression trees, with different rules used to encode operator precedence, and each rule is an objtoken. Would "optimizing" the object tree, which often looks something like or(xor(and(compare(add(mult(factor(power(2))), 2))))), to add(2,2), by deleting unnecessary children in the constructor that Regexp::Grammars calls save significant memory? Or does Regexp::Grammars keep references to these objects? Where do these 100s of megabytes of memory come from, in matching a couple hundred short lines?

Wed Apr 25 08:44:37 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Wed, 25 Apr 2018 22:43:25 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Hi Alex, Show quoted text

> Where do these 100s of megabytes of memory come from, > in matching a couple hundred short lines?

That's the huge mystery here. I simply don't know. You reported falling over at 2GB, which strongly implies a serious memory leak, not a huge object tree. Each node object is likely to be less that 1kB, so that would imply a tree of well over a million nodes, which is absurd. Optimizing the tree would, of course, reduce the memory usage, but I doubt it will solve the problem, because I doubt a big tree is causing the problem in the first place. If you can send me a self-contained example of the grammar and the data that causing this issue, I'll take a look at it myself and see if I can see anything obviously amiss. But it's more likely, I fear, that we're tripping some internal issue that's resulting in a huge memory leak. So I can't promise I'll be able to solve this for you. Damian

Thu Apr 26 14:31:55 2018 alexchandel [...] gmail.com - Correspondence added

On Wed Apr 25 08:44:37 2018, damian@conway.org wrote: Show quoted text

> Hi Alex, >

> > Where do these 100s of megabytes of memory come from, > > in matching a couple hundred short lines?

> > That's the huge mystery here. I simply don't know. > > You reported falling over at 2GB, which strongly implies > a serious memory leak, not a huge object tree. > > Each node object is likely to be less that 1kB, so that would > imply a tree of well over a million nodes, which is absurd. > > Optimizing the tree would, of course, reduce the memory usage, > but I doubt it will solve the problem, because I doubt a big tree > is causing the problem in the first place. > > If you can send me a self-contained example of the grammar > and the data that causing this issue, I'll take a look at it > myself and see if I can see anything obviously amiss. > > But it's more likely, I fear, that we're tripping some internal > issue that's resulting in a huge memory leak. So I can't promise > I'll be able to solve this for you. > > Damian

Damian, I've sent you a simplified grammar and an test file that result in this issue. Running it invariably results in an error similar to: [1] 12343 segmentation fault ./simple.pl test.cl Alex

Fri Apr 27 09:54:15 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Fri, 27 Apr 2018 23:53:23 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Thanks for the example file, Alex. I've confirmed that it segfaults under any version of Perl >= 5.18 and runs perfectly on any version of Perl between 5.10 and 5.16. So it's an internal issue of some kind in the revamped regex engine (the revamping process began in 5.18). The bad news is that this is not something I can fix in the module's code. The less bad news is that this is definitely something I can report as a Perl bug, and which someone else may eventually be able to fix. I will do this as soon as I can reduce the problem to something small enough to make an actionable bug report. For the moment, the only workaround seems to be to run it under Perl 5.16 or earlier (via perlbrew, for example). The alternative would be to look at porting your grammar to Marpa::R2 (https://metacpan.org/pod/distribution/Marpa-R2/pod/Marpa_R2.pod). Or, less painfully, to use one of the Marpa helper modules: https://metacpan.org/pod/MarpaX::Simple https://metacpan.org/pod/Grammar::Marpa I'm sorry I don't have a better answer for you than this, Alex. If I happen to come across a simpler workaround whilst I'm creating the bug report, I'll certainly post it here. Damian

Fri Apr 27 10:41:41 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sat, 28 Apr 2018 00:40:50 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

On further investigation, this grammar segfaults under pre-5.18 versions of Perl as well, if it is fed longer input. Therefore the problem is inherent in either the way the grammar is written (the tail-recursions might be an issue, maybe try <[subrule]> % SEPARATOR instead?) or else it is intrinsic to the regex engine itself, but not able to be bisected (which means it almost certainly can't be fixed). Once again, the only Perl 5 solution I can currently suggest is to try Marpa::R2 (via one of the helper modules). Alternatively, you could perhaps look at converting it to a Perl 6 grammar. I freely admit that neither of these is a particularly easy option, and I apologize again that Regexp::Grammars doesn't seem to be able to handle this task as we would both wish. Damian

Fri Apr 27 14:16:08 2018 alexchandel [...] gmail.com - Correspondence added

On Fri Apr 27 10:41:41 2018, damian@conway.org wrote: Show quoted text

> On further investigation, this grammar segfaults > under pre-5.18 versions of Perl as well, if it is fed > longer input. > > Therefore the problem is inherent in either the way > the grammar is written (the tail-recursions might be an issue, > maybe try <[subrule]> % SEPARATOR instead?) > or else it is intrinsic to the regex engine itself, but > not able to be bisected (which means it almost > certainly can't be fixed). > > Once again, the only Perl 5 solution I can currently suggest is > to try Marpa::R2 (via one of the helper modules). > > Alternatively, you could perhaps look at converting it to a > Perl 6 grammar. > > I freely admit that neither of these is a particularly > easy option, and I apologize again that Regexp::Grammars > doesn't seem to be able to handle this task as we would > both wish. > > Damian

I don't know of a public debugging interface, but is there any way you could step through Regexp::Grammars' matching to see what triggers the segfault? Even if it's inherent in how the program is written, memory shouldn't be a problem, as my computer (& address space) have far more than 2GB of memory. I can't use separators in most cases, because I have multiple separators of equal precedence that I need to preserve. For example, <[factor]> % [-+] doesn't preserve whether the tail factors were added or subtracted, and I don't see anything in the documentation on how to preserve them. But I'm not sure this is the problem. First, the trees are generally narrow. A value might parse to or(xor(and(comp(add(mult(factor(power(value(number(42)))))))))). However, more importantly, in code removed from the simplified case I send you, I optimize the entire tree to just number(42). You can test this yourself by modifying the new() sub in the Foo class to delete $self->{head} and $self->{tail} if they're present. The same segmentation fault occurs.

Sat Apr 28 05:16:27 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sat, 28 Apr 2018 09:16:05 +0000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On Fri, 27 Apr 2018, 20:16 Alex via RT, <bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > On Fri Apr 27 10:41:41 2018, damian@conway.org wrote:

> > On further investigation, this grammar segfaults > > under pre-5.18 versions of Perl as well, if it is fed > > longer input. > > > > Therefore the problem is inherent in either the way > > the grammar is written (the tail-recursions might be an issue, > > maybe try <[subrule]> % SEPARATOR instead?) > > or else it is intrinsic to the regex engine itself, but > > not able to be bisected (which means it almost > > certainly can't be fixed). > > > > Once again, the only Perl 5 solution I can currently suggest is > > to try Marpa::R2 (via one of the helper modules). > > > > Alternatively, you could perhaps look at converting it to a > > Perl 6 grammar. > > > > I freely admit that neither of these is a particularly > > easy option, and I apologize again that Regexp::Grammars > > doesn't seem to be able to handle this task as we would > > both wish. > > > > Damian

> > I don't know of a public debugging interface, but is there any way you > could step through Regexp::Grammars' matching to see what triggers the > segfault? > > Even if it's inherent in how the program is written, memory shouldn't be a > problem, as my computer (& address space) have far more than 2GB of memory. > > I can't use separators in most cases, because I have multiple separators > of equal precedence that I need to preserve. For example, <[factor]> % [-+] > doesn't preserve whether the tail factors were added or subtracted, and I > don't see anything in the documentation on how to preserve them. > > But I'm not sure this is the problem. First, the trees are generally > narrow. A value might parse to > or(xor(and(comp(add(mult(factor(power(value(number(42)))))))))). However, > more importantly, in code removed from the simplified case I send you, I > optimize the entire tree to just number(42). > > You can test this yourself by modifying the new() sub in the Foo class to > delete $self->{head} and $self->{tail} if they're present. The same > segmentation fault occurs. >

The 2gb thing is suspicious, it suggests some 32bit counter, possibly signed. But the problem you are seeing probably has to do with when temporaries are freed. It may simply be we are missing a call to free things at the right time. Yves Show quoted text

>

Sat Apr 28 11:14:32 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sun, 29 Apr 2018 01:13:40 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Show quoted text

> I don't know of a public debugging interface, but is there any way you > could step through Regexp::Grammars' matching to see what triggers the > segfault?

As Yves subsequently suggested, it's likely this issue is down in guts of Perl and not a response to any particular component of Regexp::Grammars. So stepping through the grammar parse is unlikely to help determine the problem. Unless someone stepped through it with gdb or some other interpreter-level debugger. Show quoted text

> Even if it's inherent in how the program is written, memory shouldn't > be a problem, as my computer (& address space) have far more than 2GB > of memory.

Agreed. As does mine. But, again as Yves suggested, something is hitting the 32-bit limit even if it's not malloc. Show quoted text

> I can't use separators in most cases, because I have multiple > separators of equal precedence that I need to preserve. For example, > <[factor]> % [-+] doesn't preserve whether the tail factors were added > or subtracted, and I don't see anything in the documentation on how to > preserve them.

You can remove the recursion by named-capturing the separators as well (as a list): <[factor]>+ % <[operator=([-+])]> But, yes, this will only defer the problem, not prevent it. It will cause fewer subrule calls, but eventually the 2GB limit will still be reached and the segfault will occur. Damian

Sat Apr 28 11:17:38 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sun, 29 Apr 2018 01:16:48 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Thanks for the insights, Yves. Would you like me to send you Alex's example for you to explore? I completely understand that you may have no interest in doing so, but I thought I should at least beg^H^H^Hoffer ;-) Damian

Sat Apr 28 12:47:13 2018 demerphq [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sat, 28 Apr 2018 16:46:52 +0000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	demerphq <demerphq [...] gmail.com>

On Sat, 28 Apr 2018, 17:17 damian@conway.org via RT, < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text

> Queue: Regexp-Grammars > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125105 > > > Thanks for the insights, Yves. > > Would you like me to send you Alex's example > for you to explore? > > I completely understand that you may have no interest > in doing so, but I thought I should at least beg^H^H^Hoffer ;-) >

Can't hurt. But it'll be some days before I get to it. On a laptop free holiday in Rome just now.... Yves Show quoted text

>

Sat Apr 28 13:13:16 2018 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #125105] Segmentation fault at 2 GB of memeory
Date:	Sun, 29 Apr 2018 03:12:26 +1000
To:	bug-Regexp-Grammars [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Show quoted text

> Can't hurt.

Much obliged! I'll send it directly to you. Show quoted text

> But it'll be some days before I get to it. On a laptop free > holiday in Rome just now....

Excellent. Have a great time and forget all about those annoying regexes. :-) Damian