Subject: | Query with mulitples of the same element fail |
If you have a query like "div div p" HTML::Query will fail and return
lots of elements that don't match the query. This is because of a bug
(at least I think it's a bug) in HTML::Element. If you call
$element->look_down(_tag => 'div') on an element that is itself a <div>
then it will return that element (and any <div> childrent it might have)
as part of the results too.
I'm attaching a patch that fixes this (by using refaddr() and filtering
out any elements returned from look_down() that are the same as the
element calling look_down().
Subject: | fix_look_down.patch |
--- Query.pm 2009-06-15 05:35:14.000000000 -0400
+++ Query.pm.htmlelement 2010-06-28 22:22:50.154443229 -0400
@@ -28,6 +28,7 @@
bad_spec => 'Invalid specification "%s" in query: %s',
is_empty => 'The query does not contain any elements',
};
+use Scalar::Util qw(refaddr);
our $SOURCES = {
@@ -193,9 +194,15 @@
' into args [', join(', ', @args), ']'
) if DEBUG;
- # call look_down() against each element to get the new elements
- @elements = map { $_->look_down(@args) } @elements;
-
+ my @new_elements;
+ # HTML::Element has a bug where look_down() will return the same element
+ # again if it matches the arguments. This breaks queries like "div div p"
+ foreach my $el (@elements) {
+ my $addr = refaddr($el);
+ push(@new_elements, grep { refaddr($_) != $addr } $el->look_down(@args));
+ }
+ @elements = @new_elements;
+
# so we can check we've done something
$comops++;
}