Subject: | XML::Parser::Expat crashes on utf8 stream |
I encountered a UTF-8 related bug in the Expat library wrapper.
The symptom of this bug is a Perl interpreter crash with the following
error message:
*** glibc detected *** double free or corruption (!prev): 0x081e2c00 ***
This error is caused by heap corruption from a buffer overflow in
Expat.xs, line 388:
Copy(tb, buffer, br, char)
This buffer overflow happens because the code assumes that the number of
bytes copied (br) will never exceed the number of characters read from
the input (buffsize). This assumption is invalid if the input stream is
in utf8 mode.
The best solution is to have the Perl programmer set the stream to raw
mode, since this is also what libexpat expects. I think however, that
the internal buffer overflow should be fixed anyway. The encoding issues
could also be documented more clearly.
Sample program which triggers the bug on certain input files:
---
use strict;
use encoding 'utf8';
use XML::Parser;
# (if i uncomment this, the bug disappears) binmode(STDIN, ':bytes');
my $parser = XML::Parser->new( Style => 'Debug' );
$parser->parse(\*STDIN);
---
If the package maintainer agrees that this bug should be fixed, I am
willing to provide a patch and do some testing. Just let me know if this
is appreciated.
Package: XML-Parser-2.34
Perl version: v5.8.5 built for i386-linux-thread-multi
OS: Fedora Core release 3
Bye,
Joris.