Skip Menu |

This queue is for tickets about the HTML-Scrubber CPAN distribution.

Report information
The Basics
Id: 69947
Status: open
Priority: 0/
Queue: HTML-Scrubber

People
Owner: Nobody in particular
Requestors: sangeeth2k [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: HTML scrubber validation fails for string 'a>b'
Date: Mon, 1 Aug 2011 14:33:40 -0700
To: bug-HTML-Scrubber [...] rt.cpan.org
From: sangeetha Madangopal <sangeeth2k [...] gmail.com>
Hello, I am validating HTML::Scrubber to see if it can be used in the project that I am working on. HTML scrubber does not allow use of string like 'a<b' it returns only 'a' after doing the scrub, it truncates '<b'. I looked at the HTML::Parser documentation to see if I can fix this issue, but no luck. Please help configure Scrubber to allow strings like 'a<b' pass the Scrubber validation. Your help is much appreciated! Thank you, Sangeetha Sample script: use strict; use warnings; use Test::More qw( no_plan ); use_ok('HTML::Scrubber'); use Data::Dump qw(dump); my @rules = ( script => 0, img => { src => qr{^(?!(?:(java|vb))?script)}i, alt => 1, align => 1, '*' => , }, ); my @default = ( 0 => # default rule, deny all tags { '*' => 1, # default rule, allow all attributes 'href' => qr{^(?!(?:(java|vb))?script)}i, 'src' => qr{^(?!(?:(java|vb))?script)}i, 'data' => qr{^(?!(?:(java|vb))?script)}i, 'background' => qr{^(?!(?:(java|vb))?script)}i, 'style' => 0, 'data' => qr{^(?!http://)}i, 'cite' => '(?i-xsm:^(?!(?:(java|vb))?script))', 'language' => 0, 'name' => 1, # could be sneaky, but hey ;) 'onblur' => 0, 'onchange' => 0, 'onclick' => 0, 'ondblclick' => 0, 'onerror' => 0, 'onfocus' => 0, 'onkeydown' => 0, 'onkeypress' => 0, 'onkeyup' => 0, 'onload' => 0, 'onmousedown' => 0, 'onmousemove' => 0, 'onmouseout' => 0, 'onmouseover' => 0, 'onmouseup' => 0, 'onreset' => 0, 'onselect' => 0, 'onsubmit' => 0, 'onunload' => 0, 'src' => 0, 'type' => 0, 'allowscriptaccess' => 0, } ); my $scrubber = HTML::Scrubber->new( rules => \@rules, default => \@default ); $scrubber->default(1); my $scrubbed_string; my $orig_string; my @positive_case_strings = ( 'a<b', '>x<', ); foreach my $line (@positive_case_strings) { $scrubbed_string = $scrubber->scrub($line); is(lc($scrubbed_string),lc($line),"XSS controlled \n Orig:$line \n Scrubbed:$scrubbed_string\n"); } Result: perl s.t ok 1 - use HTML::Scrubber; Odd number of elements in anonymous hash at s.t line 7. not ok 2 - XSS controlled # Orig:a<b # Scrubbed:a # # Failed test 'XSS controlled # Orig:a<b # Scrubbed:a # ' # at s.t line 71. # got: 'a' # expected: 'a<b' not ok 3 - XSS controlled # Orig:>x< # Scrubbed:&gt;x # # Failed test 'XSS controlled # Orig:>x< # Scrubbed:&gt;x # ' # at s.t line 71. # got: '&gt;x' # expected: '>x<' 1..3 # Looks like you failed 2 tests of 3.
Unconvinced this is a bug - it is invalid HTML (the character should be quoted), so all bets are off.