Subject: | segfault when parsing invalid multi-line HTML |
Parsing the following 8 lines of invalid html by calling
HTML::Strip::parse() on each line (and not calling eof()) causes perl to
crash with "free(): invalid next size (fast)".
LINES THAT CAUSE CRASH:
<option value="">123456 78912345 678901</option>
</select>
</div>
</li>
<input type="hidden" name="LPrice" value="0" />
<li >
<div class="real_left">
ABC. DEFGH:
Running the attached perl script demonstrates this bug. Output of the
crash is included below.
Note: Calling eof between lines does prevent the crash. However, this
bug is still significant because HTML files often come from untrusted
sources. Furthermore, the documentation never states that the HTML must
be valid. Finally, the result of the bug -- a crash of the perl process
-- is extremely catastrophic.
This bug has been verified on Ubuntu Linux machines for both Perl 5.8
and 5.10
drl@dev:~$ ./html_strip_crasher.pl
*** glibc detected *** /usr/bin/perl: free(): invalid next size (fast):
0x086f7228 ***
======= Backtrace: =========
/lib/tls/i686/nosegneg/libc.so.6[0x99f495]
/lib/tls/i686/nosegneg/libc.so.6(cfree+0x90)[0x9a2f70]
/usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so(XS_HTML__Strip_strip_html+0x40f)[0x49055f]
/usr/bin/perl(Perl_pp_entersub+0x313)[0x80c22d3]
/usr/bin/perl(Perl_runops_standard+0x1b)[0x80c0cab]
/usr/bin/perl(perl_run+0x2db)[0x806727b]
/usr/bin/perl(main+0x112)[0x8063792]
/lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe0)[0x949450]
/usr/bin/perl[0x8063611]
======= Memory map: ========
00110000-0011a000 r-xp 00000000 08:01 639266 /lib/libgcc_s.so.1
0011a000-0011b000 rw-p 0000a000 08:01 639266 /lib/libgcc_s.so.1
00213000-0021c000 r-xp 00000000 08:01 639273
/lib/tls/i686/nosegneg/libcrypt-2.7.so
0021c000-0021e000 rw-p 00008000 08:01 639273
/lib/tls/i686/nosegneg/libcrypt-2.7.so
0021e000-00245000 rw-p 0021e000 00:00 0
0048d000-00492000 r-xp 00000000 08:01 517558
/usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so
00492000-00493000 rw-p 00004000 08:01 517558
/usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so
006b6000-006d9000 r-xp 00000000 08:01 639282
/lib/tls/i686/nosegneg/libm-2.7.so
006d9000-006db000 rw-p 00023000 08:01 639282
/lib/tls/i686/nosegneg/libm-2.7.so
00898000-00899000 r-xp 00898000 00:00 0 [vdso]
00924000-0092e000 r-xp 00000000 08:01 444248
/usr/lib/perl5/auto/HTML/Parser/Parser.so
0092e000-0092f000 rw-p 00009000 08:01 444248
/usr/lib/perl5/auto/HTML/Parser/Parser.so
00933000-00a7f000 r-xp 00000000 08:01 639248
/lib/tls/i686/nosegneg/libc-2.7.so
00a7f000-00a80000 r--p 0014c000 08:01 639248
/lib/tls/i686/nosegneg/libc-2.7.so
00a80000-00a82000 rw-p 0014d000 08:01 639248
/lib/tls/i686/nosegneg/libc-2.7.so
00a82000-00a85000 rw-p 00a82000 00:00 0
00ae7000-00ae9000 r-xp 00000000 08:01 639280
/lib/tls/i686/nosegneg/libdl-2.7.so
00ae9000-00aeb000 rw-p 00001000 08:01 639280
/lib/tls/i686/nosegneg/libdl-2.7.so
00b73000-00b87000 r-xp 00000000 08:01 641518
/lib/tls/i686/nosegneg/libpthread-2.7.so
00b87000-00b89000 rw-p 00013000 08:01 641518
/lib/tls/i686/nosegneg/libpthread-2.7.so
00b89000-00b8b000 rw-p 00b89000 00:00 0
00e8b000-00ea5000 r-xp 00000000 08:01 641487 /lib/ld-2.7.so
00ea5000-00ea7000 rw-p 00019000 08:01 641487 /lib/ld-2.7.so
08048000-0814d000 r-xp 00000000 08:01 443032 /usr/bin/perl
0814d000-08151000 rw-p 00104000 08:01 443032 /usr/bin/perl
08151000-08153000 rw-p 08151000 00:00 0
086c6000-0878b000 rw-p 086c6000 00:00 0
b7d00000-b7d21000 rw-p b7d00000 00:00 0
b7d21000-b7e00000 ---p b7d21000 00:00 0
b7e2b000-b7e4c000 rw-p b7e2b000 00:00 0
b7e4c000-b7f82000 r--p 00000000 08:01 443652
/usr/lib/locale/locale-archive
b7f82000-b7f84000 rw-p b7f82000 00:00 0
b7f8b000-b7f8e000 rw-p b7f8b000 00:00 0
bfd28000-bfd3d000 rw-p bfd28000 00:00 0 [stack]
Aborted
drl@dev:~$
Subject: | html_strip_crasher.pl |
#!/usr/bin/perl
use strict;
use HTML::Strip;
my @lines = (
'<option value="">123456 78912345 678901</option>',
'</select>',
'</div>',
'</li>',
'<input type="hidden" name="LPrice" value="0" />',
'<li >',
'<div class="real_left">',
'ABC. DEFGH:'
);
sub main {
my $hs_loc = HTML::Strip->new();
for ( my $i = 0 ; $i < @lines ; $i++ ) {
my $line = $lines[$i];
$hs_loc->parse($line);
}
}
main();