Subject: | workaround bug in HTML::TreeBuilder::XPath |
As reported here: https://rt.cpan.org/Public/Bug/Display.html?id=81722
HTML::TreeBuilder lowercases the attribute names in the resulting HTML
tree, but HTML::TreeBuilder::XPath doesn't bother to lowercase the
attribute names in the selector, so if the user tries to select based
upon the original-cased version of the attribute name, and it was
anything other than all-lowercase, the match would fail.
The author of HTML::TreeBuilder::XPath apparently doesn't think this is
a bug, as he thinks it will suffice to document this in
HTML::TreeBuilder, which isn't going to help somebody using a
higher-level module, like Web::Scraper.
Here is a test case demonstrating the issue in HTML::TreeBuilder::XPath:
https://rt.cpan.org/Ticket/Attachment/1149252/604410/d.pl
This can be worked around in HTML::Selector::XPath with the attached patch.
Subject: | lcattr.diff |
diff -Naur lib/HTML/Selector/XPath.pm /tmp/lib/HTML/Selector/XPath.pm
--- lib/HTML/Selector/XPath.pm 2012-10-01 17:18:02.000000000 +0000
+++ /tmp/lib/HTML/Selector/XPath.pm 2012-12-06 06:36:07.000000000 +0000
@@ -50,6 +50,8 @@
sub convert_attribute_match {
my ($left,$op,$right) = @_;
+ $left = lc $left;
+
# negation (e.g. [input!="text"]) isn't implemented in CSS, but include it anyway:
if ($op eq '!=') {
"\@$left!='$right'";
@@ -166,7 +168,7 @@
push @parts, '*';
$tag_index = $#parts;
};
- push @parts, "[\@$1]";
+ push @parts, "[\@\L$1]";
} elsif ($rule =~ $reg->{badattr}) {
Carp::croak "Invalid attribute-value selector '$rule'";
}
@@ -177,7 +179,7 @@
if ($sub_rule =~ s/$reg->{attr2}//) {
push @parts, "[not(", convert_attribute_match( $1, $2, $^N ), ")]";
} elsif ($sub_rule =~ s/$reg->{attr1}//) {
- push @parts, "[not(\@$1)]";
+ push @parts, "[not(\@\L$1)]";
} elsif ($rule =~ $reg->{badattr}) {
Carp::croak "Invalid attribute-value selector '$rule'";
} else {