Skip Menu |

This queue is for tickets about the HTML-StripScripts-Parser CPAN distribution.

Report information
The Basics
Id: 87747
Status: open
Priority: 0/
Queue: HTML-StripScripts-Parser

People
Owner: Nobody in particular
Requestors: test [...] mail.yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.03
Fixed in: (no value)



Subject: The parser just ignore your rules relating to a tags
cpanm HTML::StripScripts::Parser ... Successfully installed HTML-StripScripts-Parser-1.03 use HTML::StripScripts::Parser; my $val = '<a title="hey" target="_blank" rel="nofollow" href="http://www.google.com">www.google.com</a>'; my $hss = HTML::StripScripts::Parser->new({ Content => 'Flow', Rules => { a => { title => 1, href => 1, rel => 1, target => 1, }, }, }); say $hss->filter_html( $val ); Outputs: <a>www.google.com</a> Really? Nothing in the rules object is respected.
RT-Send-CC: andy [...] crowdtilt.com
On Fri Aug 09 14:38:12 2013, delvarworld wrote: Show quoted text
> cpanm HTML::StripScripts::Parser > ... > Successfully installed HTML-StripScripts-Parser-1.03 > > use HTML::StripScripts::Parser; > > my $val = '<a title="hey" target="_blank" rel="nofollow" > href="http://www.google.com">www.google.com</a>'; > my $hss = HTML::StripScripts::Parser->new({ > Content => 'Flow', > Rules => { > a => { > title => 1, > href => 1, > rel => 1, > target => 1, > }, > > }, > }); > > say $hss->filter_html( $val ); > > Outputs: > > <a>www.google.com</a> > > Really? Nothing in the rules object is respected.
By default HTML::StripScripts only allows attributes "href" and "style" for "a" tags. If you would like it to allow other attributes, you need to extend HTML::StripScripts::Parser and override the init_attrib_whitelist method as so: package MyStripper; use Moo; extends 'HTML::StripScripts::Parser'; around init_attrib_whitelist => sub { my ($orig, $self) = @_; my $attr = $self->$orig(); $attr->{a}{target} = 'word'; $attr->{a}{title} = 'text'; $attr->{a}{rel} = 'word'; return $attr; }; my $hss = MyStripper->new({ Context => 'Flow', AllowHref => 1, Rules => { a => { href => qr{^http://www.google.com}, }, }, }); my $val = '<a title="hey" target="_blank" rel="nofollow" href="http://www.google.com">www.google.com</a>'; print $hss->filter_html($val); Outputs: <a href="http://www.google.com" rel="nofollow" target="_blank" title="hey">www.google.com</a> Best regards, Naveed Massjouni
RT-Send-CC: andy [...] crowdtilt.com
On Tue Aug 13 03:12:25 2013, IRONCAMEL wrote: Show quoted text
> On Fri Aug 09 14:38:12 2013, delvarworld wrote:
> > cpanm HTML::StripScripts::Parser > > ... > > Successfully installed HTML-StripScripts-Parser-1.03 > > > > use HTML::StripScripts::Parser; > > > > my $val = '<a title="hey" target="_blank" rel="nofollow" > > href="http://www.google.com">www.google.com</a>'; > > my $hss = HTML::StripScripts::Parser->new({ > > Content => 'Flow', > > Rules => { > > a => { > > title => 1, > > href => 1, > > rel => 1, > > target => 1, > > }, > > > > }, > > }); > > > > say $hss->filter_html( $val ); > > > > Outputs: > > > > <a>www.google.com</a> > > > > Really? Nothing in the rules object is respected.
> > By default HTML::StripScripts only allows attributes "href" and > "style" for "a" tags. If you would like it to allow other attributes, > you need to extend > HTML::StripScripts::Parser and override the init_attrib_whitelist > method as so: > > package MyStripper; > use Moo; > extends 'HTML::StripScripts::Parser'; > > around init_attrib_whitelist => sub { > my ($orig, $self) = @_; > my $attr = $self->$orig(); > $attr->{a}{target} = 'word'; > $attr->{a}{title} = 'text'; > $attr->{a}{rel} = 'word'; > return $attr; > }; > > my $hss = MyStripper->new({ > Context => 'Flow', > AllowHref => 1, > Rules => { > a => { > href => qr{^http://www.google.com}, > }, > }, > }); > > my $val = '<a title="hey" target="_blank" rel="nofollow" > href="http://www.google.com">www.google.com</a>'; > print $hss->filter_html($val); > > Outputs: > > <a href="http://www.google.com" rel="nofollow" target="_blank" > title="hey">www.google.com</a> > > Best regards, > Naveed Massjouni
Note that this field href => qr{^http://www.google.com}, is not necessary, but it allows you to only allow hrefs matching the given regex. Also, see this for details about subclassing the whitelist methods: https://metacpan.org/module/HTML::StripScripts#WHITELIST-INITIALIZATION-METHODS -Naveed