Skip Menu |

This queue is for tickets about the HTML-Strip CPAN distribution.

Report information
The Basics
Id: 41035
Status: resolved
Priority: 0/
Queue: HTML-Strip

People
Owner: Nobody in particular
Requestors: bitcard [...] larochelle.name
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: 2.01



Subject: segfault when parsing invalid multi-line HTML
Parsing the following 8 lines of invalid html by calling HTML::Strip::parse() on each line (and not calling eof()) causes perl to crash with "free(): invalid next size (fast)". LINES THAT CAUSE CRASH: <option value="">123456 78912345 678901</option> </select> </div> </li> <input type="hidden" name="LPrice" value="0" /> <li > <div class="real_left"> ABC. DEFGH: Running the attached perl script demonstrates this bug. Output of the crash is included below. Note: Calling eof between lines does prevent the crash. However, this bug is still significant because HTML files often come from untrusted sources. Furthermore, the documentation never states that the HTML must be valid. Finally, the result of the bug -- a crash of the perl process -- is extremely catastrophic. This bug has been verified on Ubuntu Linux machines for both Perl 5.8 and 5.10 drl@dev:~$ ./html_strip_crasher.pl *** glibc detected *** /usr/bin/perl: free(): invalid next size (fast): 0x086f7228 *** ======= Backtrace: ========= /lib/tls/i686/nosegneg/libc.so.6[0x99f495] /lib/tls/i686/nosegneg/libc.so.6(cfree+0x90)[0x9a2f70] /usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so(XS_HTML__Strip_strip_html+0x40f)[0x49055f] /usr/bin/perl(Perl_pp_entersub+0x313)[0x80c22d3] /usr/bin/perl(Perl_runops_standard+0x1b)[0x80c0cab] /usr/bin/perl(perl_run+0x2db)[0x806727b] /usr/bin/perl(main+0x112)[0x8063792] /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xe0)[0x949450] /usr/bin/perl[0x8063611] ======= Memory map: ======== 00110000-0011a000 r-xp 00000000 08:01 639266 /lib/libgcc_s.so.1 0011a000-0011b000 rw-p 0000a000 08:01 639266 /lib/libgcc_s.so.1 00213000-0021c000 r-xp 00000000 08:01 639273 /lib/tls/i686/nosegneg/libcrypt-2.7.so 0021c000-0021e000 rw-p 00008000 08:01 639273 /lib/tls/i686/nosegneg/libcrypt-2.7.so 0021e000-00245000 rw-p 0021e000 00:00 0 0048d000-00492000 r-xp 00000000 08:01 517558 /usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so 00492000-00493000 rw-p 00004000 08:01 517558 /usr/local/lib/perl/5.8.8/auto/HTML/Strip/Strip.so 006b6000-006d9000 r-xp 00000000 08:01 639282 /lib/tls/i686/nosegneg/libm-2.7.so 006d9000-006db000 rw-p 00023000 08:01 639282 /lib/tls/i686/nosegneg/libm-2.7.so 00898000-00899000 r-xp 00898000 00:00 0 [vdso] 00924000-0092e000 r-xp 00000000 08:01 444248 /usr/lib/perl5/auto/HTML/Parser/Parser.so 0092e000-0092f000 rw-p 00009000 08:01 444248 /usr/lib/perl5/auto/HTML/Parser/Parser.so 00933000-00a7f000 r-xp 00000000 08:01 639248 /lib/tls/i686/nosegneg/libc-2.7.so 00a7f000-00a80000 r--p 0014c000 08:01 639248 /lib/tls/i686/nosegneg/libc-2.7.so 00a80000-00a82000 rw-p 0014d000 08:01 639248 /lib/tls/i686/nosegneg/libc-2.7.so 00a82000-00a85000 rw-p 00a82000 00:00 0 00ae7000-00ae9000 r-xp 00000000 08:01 639280 /lib/tls/i686/nosegneg/libdl-2.7.so 00ae9000-00aeb000 rw-p 00001000 08:01 639280 /lib/tls/i686/nosegneg/libdl-2.7.so 00b73000-00b87000 r-xp 00000000 08:01 641518 /lib/tls/i686/nosegneg/libpthread-2.7.so 00b87000-00b89000 rw-p 00013000 08:01 641518 /lib/tls/i686/nosegneg/libpthread-2.7.so 00b89000-00b8b000 rw-p 00b89000 00:00 0 00e8b000-00ea5000 r-xp 00000000 08:01 641487 /lib/ld-2.7.so 00ea5000-00ea7000 rw-p 00019000 08:01 641487 /lib/ld-2.7.so 08048000-0814d000 r-xp 00000000 08:01 443032 /usr/bin/perl 0814d000-08151000 rw-p 00104000 08:01 443032 /usr/bin/perl 08151000-08153000 rw-p 08151000 00:00 0 086c6000-0878b000 rw-p 086c6000 00:00 0 b7d00000-b7d21000 rw-p b7d00000 00:00 0 b7d21000-b7e00000 ---p b7d21000 00:00 0 b7e2b000-b7e4c000 rw-p b7e2b000 00:00 0 b7e4c000-b7f82000 r--p 00000000 08:01 443652 /usr/lib/locale/locale-archive b7f82000-b7f84000 rw-p b7f82000 00:00 0 b7f8b000-b7f8e000 rw-p b7f8b000 00:00 0 bfd28000-bfd3d000 rw-p bfd28000 00:00 0 [stack] Aborted drl@dev:~$
Subject: html_strip_crasher.pl
#!/usr/bin/perl use strict; use HTML::Strip; my @lines = ( '<option value="">123456 78912345 678901</option>', '</select>', '</div>', '</li>', '<input type="hidden" name="LPrice" value="0" />', '<li >', '<div class="real_left">', 'ABC. DEFGH:' ); sub main { my $hs_loc = HTML::Strip->new(); for ( my $i = 0 ; $i < @lines ; $i++ ) { my $line = $lines[$i]; $hs_loc->parse($line); } } main();
I've got the test case down to two lines; see attached.
Subject: html_strip_crasher.pl
#!/usr/bin/perl use strict; use warnings; use HTML::Strip; my @lines = ( '<b>1</b><li>', 'ABC. DEFGH:' ); my $hs = HTML::Strip->new(); for my $line ( @lines ) { $hs->parse($line); }
On Tue Apr 17 09:48:23 2012, DIOCLES wrote: Show quoted text
> I've got the test case down to two lines; see attached.
This was tested with the HTML::Strip (1.06) in Fedora 16, but neither test case has crashed for me when I built HTML::Strip from source to investigate further. :(
From: michisteiner [...] verizon.net
Running also into this problem and debugging a bit i figured the (or at least one) problem: In Strip.xs the size of output buffer (= clean) is set via int size = strlen(raw) + 1; to be the same as the input buffer (= raw). However, if the state indicates in_tag, emit_spaces is true and the raw input does not have anything which is removed/shrunk, then the output can get larger than the input as there will be a space prepend!! => counter-measures: - without changing module, just adding ``emit_spaces => 0'' to new() did help for me. - changing Strip.xs to change the line mentioned above to int size = strlen(raw) + 2; also seemed to work (although i didn't go through all code in strip_html to verify that there is only a single additional space which can be added to output and there might not be other circumstances where output could be larger than input.
I can't replicate this bug - all of the test cases listed run fine for me. I am using perl v5.14.2 on Ubuntu Precise.
RT-Send-CC: bitcard [...] larochelle.name, tim [...] retout.co.uk
On Tue Sep 23 12:40:02 2014, KILINRAX wrote: Show quoted text
> I can't replicate this bug - all of the test cases listed run fine for me. > > I am using perl v5.14.2 on Ubuntu Precise.
Can one of you rerun the breaking test cases and determine if they're still an issue? If they are I'll need to know more about your system/poerl version to proceed.
Michi's patch applied in v2.01.