Subject: | limited support for look-ahead and \G |
One of typical usage for look-ahead is split string into parts for
independent processing, like this:
while ($html =~ /<h2>(.*?)(?=<h2|$)/imsg) {
my $part = $1;
while ($part =~ /.../g) {}
}
With re2 this may be rewritten in this way:
while ($html =~ /<h2>(.*?)(<h2|$)/imsg) {
pos($html) = pos($html) - length($2);
my $part = $1;
while ($part =~ /.../g) {}
}
I think this "special" case can be handled by re2 module internally -
i.e. if re2 detect _one_ look-ahead at _end_ or regex, it may replace
it with usual capturing parentheses, and after executing regexp update
pos() and remove extra $n var (or leave extra var in place if removing
it will be too complex, just mention this behavior in doc).
As for \G, I'm not 100% sure, but I remember there was some re2-
specific features which may be used to tie match to some position in
string. In this is true, then, again, as special case re2 module can
replace \G at _beginning_ of regex with call to re2-specific function
to tie match to current pos() value.