Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 55629
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: NIKOLAS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.64
Fixed in: (no value)



Subject: Wrong parse
HTML: <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" height="100%" scrolling="auto" frameborder="no"></iframe> $parser = HTML::Parser->new( api_version => 3, start_h => [ sub{ my ($Self, $Text, $Tag, $Attr) = @_; print "Tag is: ".$Tag; }, "self, text, tagname, attr" ] ); $parser->ignore_elements( qw( iframe )); $parser->ignore_tags( qw( iframe )); output: Tag is: iframe/**/src="http://mail.ru"
Втр Мар 16 11:09:51 2010, NIKOLAS писал: Show quoted text
> HTML: > <iframe/**/src="http://mail.ru" name="poc iframe jacking" width="100%" > height="100%" scrolling="auto" frameborder="no"></iframe> > > $parser = HTML::Parser->new( > api_version => 3, > start_h => [ sub{ > my ($Self, $Text, $Tag, $Attr) = @_; > print "Tag is: ".$Tag; > }, "self, text, tagname, attr" ] > ); > $parser->ignore_elements( qw( iframe )); > $parser->ignore_tags( qw( iframe )); > > output: > Tag is: iframe/**/src="http://mail.ru"
HTML: <script/src="ya.ru"> wrong parse same
I don't understand what rules you propose that HTML::Parser should follow to parse this kind of bogus HTML. You think it should treat "/**/" and "/" as whitespace?
Here 3 regular expressions applied to the entrance text correct this problems: s{(/\*)}{ $1}g; s{(\*/)}{$1 }g; s{(<[^/\s<>]+)/}{$1 /}g; Probably you will find more correct architectural decision.