Subject: | Problems w/ string literal scanning |
I'm using JE::Parser version "0.035";
Problem #1:
I'm trying to use the WWW::Scripter module with
WWW::Scripter::Plugin::JavaScript to automtically parse a website. The
plugin uses JE.
The website gives me a very long string literal (>50000 chars) to supply
data for a graph in the javascript section. This causes
JE::Parser::str() to fail with this internal perl bug:
Complex regular subexpression recursion limit (32766) exceeded
I traced the failure to this regex:
/\G (?: '((?>(?:[^'\\] | \\.)*))'
|
"((?>(?:[^"\\] | \\.)*))" )/xcgs or return;
I know nothing about the internal structure of perl's regex handler but
I don't see why this regex should cause a deep recursion. It seems to
me that a reasonable implementation whould only recurse to a level of
maybe 5-10. Nonetheless, after trying many different versions of the
regex, I couldn't find an alternative that produced the same result but
didn't hit the recursion limit.
As a workaround, I've created a version of the rexex that doesn't hit
the recusion limit but requires more perl code to parse strings:
# Use a simpler pattern (but more code) to break strings up into
extents bounded by the quote or escape
my $yarn="";
my $qt = substr($_,pos($_),1);
$qt =~ /['"]/ or return; # not a string literal if first char
not a quote
pos($_)++;
my $done = 0;
while (defined(substr($_,pos($_),1))) {
my ($part) = /\G([^\\$qt]*)/xcgs;
defined($part) or $part = "";
$yarn .= $part;
my $next = substr($_,pos($_)++,1);
if ($next eq "\\") {
#pass on any escaped char
$next = substr($_,pos($_)++,1);
$yarn .= "\\$next";
} else {
# handle end quote
$done = 1;
last;
}
}
Problem #2:
The latest ECMA language reference (Edition 5) allows line breaks in the
source only when escaped with '\'. I've confirmed that at least the
Firefox javascript parser follows this rule. JE::Parser::str was
allowing linefeeds anywhere between quotes. I fixed str() to disallow
unescaped line feeds by default.
Problem #3:
Per the ECMA spec, the usage of "\" at the end of a line within a string
literal is intended as a continuation of the string. For example, the
following assertion should be true:
"AB" === "A\
B";
JE::Parser::str was retaining the line feed after "\" which would render
the above assertion false. I've changed the code to remove the escape
and line feed. It now also removes escaped DOS-stype line breaks "\\\r\n".
Problem #4:
There is one test file "t/10.02-entering-execution-contexts.t" which
assumes the illegal inclusion of unescaped line feeds mentioned in
Problem #2. Rather than fix the test, I figured that there may be some
other environments in the world that need it. Therefore, I added a
variable "$JE::Parser::allow_unescaped_lf" which can be set if desired.
I'm not sure this is the best solution. Perhaps it should be set in
some kind of "set_options" call or maybe it shouldn't be allowed at all.
"t/10.02-entering-execution-contexts.t" passes with this flag set.
I've created a new test file "t/parse00.1-long_strings.t" to test the
above issues.
I've included the modified files in hopes they can be integrated into
the next release.
Thanks for your attention to this.
Kevin Cameron
San Jose, CA
Subject: | Parser.pm |
Message body is not shown because it is too large.
Subject: | 10.02-entering-execution-contexts.t |
#!perl -T
# WARNING: This is a brilliant example of how not to write code. (Though
# I must admit that the sextuple backslash was fun.)
BEGIN { require './t/test.pl' }
use Test::More tests => 30;
use Scalar::Util 'refaddr';
use strict;
use utf8;
# Test 1: See if the module loads
BEGIN { use_ok('JE') };
my $j = new JE;
# allow non-spec compliant line feeds in string literals below
$JE::Parser::allow_unescaped_lf = 1;
# Tests 2-3: Bind the ok and diag functions
isa_ok( $j->new_function( ok => \&ok ), 'JE::Object::Function' );
isa_ok( $j->new_function( diag => \&diag ), 'JE::Object::Function' );
# Test 4: Bind Perl's eval to JS so we can do naughty stuff.
isa_ok( $j->new_function( peval => sub { eval shift } ),
'JE::Object::Function' );
# Run JS tests
defined $j->eval( <<'--end--' ) or die;
// ---------------------------------------------------
/* Tests 5-9: global code */
// These line breaks would cause a syntax error according to spec.
// JE supports line breaks in string literals. Don't tell anyone.
peval('
my @scope = @$JE::Code::scope; # Don\'t do this at
# home--I mean in pro-
# duction code! These
# internal details are sub-
# ject to change.
ok @scope == 1, "scope chain in global code contains one object";
ok refaddr $scope[0] == refaddr $j,
"that object is the global object";
');
var thing;
ok(this.hasOwnProperty('thing'), 'global object is the variable object')
ok(!delete thing, 'vars declared in global code are undeletable')
peval('ok refaddr shift @_ == refaddr $j,
"the \'this\' value is the global object"', this)
// ---------------------------------------------------
/* Tests 10-17: function code */
var obj = Object();
with(obj) {
obj.f = function(){
peval('
my @scope = @$JE::Code::scope;
my @fscope = @{$${$j->eval("obj.f")->get}{scope}};
ok @scope == 3, q/@scope == 3/;
ok @fscope == 2, q/@fscope == 2/;
ok refaddr $scope[0] == refaddr $fscope[0] &&
refaddr $scope[0] == refaddr $j,
"first object in function\'s scope chain" .
" is the global obj";
ok refaddr $scope[1] == refaddr $fscope[1] &&
refaddr $scope[1] == refaddr $j->prop("obj"),
"\'with\' object in scope chain";
ok ref $scope[2] eq \'JE::Object::Function::Call\',
"\\\$scope has a call object";
');
var thing;
ok(peval('exists $JE::Code::scope->[-1]{thing}
'), "activation object is the variable object")
ok(!delete thing,
'vars declared in function code are undeletable')
ok(this === obj, '"this" value in function code')
}
}
obj.f()
// ---------------------------------------------------
/* Tests 18-22: 'global-eval code' (eval code called from global code) */
eval("
peval('
my @scope = @$JE::Code::scope;
ok @scope == 1,
\"scope chain in global-eval code contains one object\";
ok refaddr $scope[0] == refaddr $j,
\"object in global-eval scope chain is the global object\";
');
var thing2;
ok(this.hasOwnProperty('thing2'),
'global object is the variable object in global-eval code')
ok(delete thing2, 'vars declared in global-eval code are deletable')
peval('ok refaddr shift(@_) == refaddr $j,
\"the \\'this\\' value in global-eval code is the global object\"', this)
")
// ---------------------------------------------------
/* Tests 23-30: function-eval code */
var obj2 = Object();
with(obj2) {
obj2.f = function(){
eval("
peval('
my @scope = @$JE::Code::scope;
my @fscope =
@{$${$j->eval(\"obj2.f\")->get}{scope}};
ok @scope == 3, q/@scope == 3 (function-eval)/;
ok @fscope == 2, q/@fscope == 2 (function-eval)/;
ok refaddr $scope[0] == refaddr $fscope[0] &&
refaddr $scope[0] == refaddr $j,
\"first object in function\\'s scope \" .
\" chain is the global obj (in eval code)\";
ok refaddr $scope[1] == refaddr $fscope[1] &&
refaddr $scope[1] == refaddr $j->prop(\"obj2\"),
\"\\'with\\' object in function-eval \" .
\"scope chain\";
ok ref $scope[2] eq
\\'JE::Object::Function::Call\\',
\"\\\\\\$scope has a call object\";
');
var thing;
ok(peval('exists $JE::Code::scope->[-1]{thing}
'), \"activation object is the variable object \" +
\"in function-eval code\")
ok(delete thing,
'vars declared in function-eval code are deletable')
ok(this === obj2, '\"this\" value in function-eval code')
") // end of eval
}
}
obj2.f()
--end--
Subject: | parse00.1-long_strings.t |
#!perl
use Test::More tests => 33;
use strict;
no warnings 'utf8';
# build long strings for tests
# has imbedded \n and different kinds of quotes
# parts from which to build long strings
my $part_plain = "x" x 50;
my $part_quotes = ("y" x 8 . "\\\"" . "z" x 8 . "\\\'") x 3;
my $part_nl = "a" x 49 . "\n";
my $part_cont_nl = "b" x 49 . "\\\n";
my $part_dosnl = "a" x 49 . "\r\n";
my $part_cont_dosnl = "b" x 49 . "\\\r\n";
my $part_quote_nl = $part_quotes . "\n";
my $part_quote_cont_nl = $part_quotes . "\\\n";
my $part_quote_dosnl = $part_quotes . "\r\n";
my $part_quote_cont_dosnl = $part_quotes . "\\\r\n";
# legal test strings
my $long_legal_nl = ($part_plain . $part_cont_nl . $part_quotes . $part_cont_dosnl . $part_plain . $part_quote_cont_nl . $part_quote_cont_dosnl) x 100;
my $long_legal_nonl = ($part_plain . $part_quotes) x 500;
my $long_legal_noqt_nonl = ($part_plain) x 10000;
my $short_legal_cont_nl = ($part_cont_nl . $part_quote_cont_nl . $part_cont_dosnl . $part_quote_cont_dosnl) x 3;
my $short_legal_nonl = ($part_plain . $part_quotes);
# illegal test strings (line continuations w/o \)
my $short_illegal = ($part_nl . $part_quote_nl . $part_dosnl . $part_quote_dosnl) x 3;
my $long_illegal = ($part_plain . $part_cont_nl . $part_quotes . $part_nl . $part_plain . $part_quote_nl) x 200;
my $single = "\'";
my $double = "\"";
my %qname = ($single, 'single', $double, 'double');
my $j;
# add arbitrary quotes
sub add_quote {
my ($qt, $str) = @_;
return $qt . $str . $qt;
}
# compute expected values
# only works for cases in this test
sub get_exp {
my ($qt, $in) = @_;
$in =~ s/\\\r?\n//gx; # concatenate legal line continuations
$in =~ s/\\(.)/$1/gx;
$in;
}
# test legal string literal
# try both single and double quoted versions
sub test_legal {
my ($str, $msg) = @_;
test_legal_1 ($single, $str, $msg);
test_legal_1 ($double, $str, $msg);
}
sub test_legal_1 {
my ($qt, $str, $msg) = @_;
my $code;
my $got = "";
my $qmsg = "$qname{$qt}-quoted $msg";
# test for legal syntax
ok ($code = $j->parse ("var a = " . add_quote($qt, $str) . ";\n"), "Parse $qmsg");
# test for correct value
$code->execute if $code;
$got = $j->prop("a") if $code;
my $exp = get_exp ($qt, $str);
if (length ($str) < 1000) {
is ($got , $exp, "Value of $qmsg"); # more descriptive error message
} else {
ok ($got eq $exp, "Value of $qmsg"); # shorter error message for gigantic strings
}
}
# test illegal string litteral
sub test_illegal {
my ($str, $msg) = @_;
# test for catch of illegal syntax (parse should fail)
ok (!$j->parse ("var a = " . add_quote($single, $str) . ";\n"), "single-quoted $msg");
ok (!$j->parse ("var a = " . add_quote($double, $str) . ";\n"), "double-quoted $msg");
}
#--------------------------------------------------------------------#
# Test 1: See if the module loads
BEGIN { use_ok('JE') };
$j = new JE;
#--------------------------------------------------------------------#
# Tests 2-21: Parse legal strings
test_legal ($long_legal_nl , 'long string with escaped quotes and legal line continuations');
test_legal ($long_legal_nonl , 'long string with escaped quotes and no line continuations' );
test_legal ($long_legal_noqt_nonl, 'long string with no escaped quotes or line continuations' );
test_legal ($short_legal_cont_nl , 'short string with legal line continuations' );
test_legal ($short_legal_nonl , 'short string with escaped quotes and no line continuations' );
#--------------------------------------------------------------------#
# Tests 22-25: Parse illegal strings (must fail)
test_illegal ($long_illegal , 'long string with illegal line continuations' );
test_illegal ($short_illegal , 'short string with illegal line continuations');
#--------------------------------------------------------------------#
# Tests 26-33: Illegal line feeds can be allowed by setting a parser option
$JE::Parser::allow_unescaped_lf = 1;
test_legal ($long_illegal , 'long string with illegal line continuations (allow unescape line feeds)' );
test_legal ($short_illegal , 'short string with illegal line continuations (allow unescape line feeds)');