Skip Menu |

This queue is for tickets about the Regexp-Assemble CPAN distribution.

Report information
The Basics
Id: 20847
Status: resolved
Priority: 3/
Queue: Regexp-Assemble

People
Owner: dland [...] cpan.org
Requestors: book [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: 0.32



Subject: optimise (?:a(?:...)|a+) as (?:a(?:a*|...))
Here's a short description: 16:15 <@BooK> assemble a ab acd abd a+ ab+c 16:15 < assemble> BooK: (?:a(?:b+c|bd?|cd)?|a+) 16:16 <@BooK> tiens, ça peut s'optimiser en (?:a(?:b+c|bd?|cd|a*) non ? Actually if you make this kind of optimisation, b+c|bd? becomes b(?:b*c|d?) or b(?:b*c|d)?, so that should probably produce (?:a(?:a*|b(?:b*c|d?)|cd)) or (?:a(?:a*|b(?:b*c|d)?|cd)) Does that make any sense?
On Fri Aug 04 10:30:35 2006, BOOK wrote: Show quoted text
> b+c|bd? becomes > b(?:b*c|d?) or b(?:b*c|d)?, so that should probably produce > (?:a(?:a*|b(?:b*c|d?)|cd)) or (?:a(?:a*|b(?:b*c|d)?|cd)) > > Does that make any sense?
Hi Book, yes, this makes perfect sense. What needs to happen is to do an 'unfold_plus' step during lexing, which would map a+ -> a a*. Then everything would proceed as before. There would be a 'refold_plus' step during output, which would look for a a* and coalesce that back into a+, otherwise this would bulk up patterns that make heavy use of + modifiers (when no such reductions occur as you outline). But maybe refolding could be optional too. I'll have to benchmark the differences. Thanks for the idea.
This feature requested has been added in release 0.32, now available on CPAN. It is not complete, in that there is room for improvement, but it's a start. Thanks, David