Subject: | linestr reallocation in large program |
Date: | Mon, 18 Jul 2011 17:01:50 +0100 |
To: | bug-Devel-Declare [...] rt.cpan.org |
From: | Zefram <zefram [...] fysh.org> |
When the tokeniser reads in a new line, if this requires the next block
of input to be read from the file then PL_linestr will be reallocated if
necessary, to accommodate its old contents plus the entire next block
(even though it ends up only containing the old contents plus the next
line). New line reading can be triggered through Devel::Declare via
toke_skipspace(). Any reallocation of PL_linestr at this point breaks
D:D, by making PL_bufptr invalid. It follows that D:D needs to ensure
that PL_linestr is already large enough to accommodate the largest line
expected to be seen *plus one I/O block*.
Historically I/O blocks have been small enough that D:D's preferred
PL_linestr size of 8192 was sufficient. But in 5.14 the buffer size has
been increased, now defaulting to 8192. So if Devel::Declare is used on
a large source file (tens of kilobytes) on 5.14, it will eventually have
PL_linestr reallocated under it. I ran into this in work code, where we
require D:D to parse multi-line custom syntax and we have a large number
of large source files. The failure mode amounted to memory corruption,
resulting from PL_bufptr being in freed space.
Attached patch increases D:D's preferred PL_linestr size to 16384, which
fixes the problem where the I/O block size is 8192. Possibly should
be cleverer about that; look at PERLIOBUF_DEFAULT_BUFSIZ in the core
if you want to work on that. The patch also makes toke_skipspace()
check whether reallocation has actually occurred, and bomb out if it has.
So if the problem recurs, it'll at least be reported cleanly. Patch also
adds a test script, which is necessarily a bit large, but compresses well.
-zefram
Message body is not shown because sender requested not to inline it.