Skip Menu |

This queue is for tickets about the HTML-Query CPAN distribution.

Report information
The Basics
Id: 58918
Status: resolved
Priority: 0/
Queue: HTML-Query

People
Owner: Nobody in particular
Requestors: wonko [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.02
Fixed in: (no value)



Subject: Query with mulitples of the same element fail
If you have a query like "div div p" HTML::Query will fail and return lots of elements that don't match the query. This is because of a bug (at least I think it's a bug) in HTML::Element. If you call $element->look_down(_tag => 'div') on an element that is itself a <div> then it will return that element (and any <div> childrent it might have) as part of the results too. I'm attaching a patch that fixes this (by using refaddr() and filtering out any elements returned from look_down() that are the same as the element calling look_down().
Subject: fix_look_down.patch
--- Query.pm 2009-06-15 05:35:14.000000000 -0400 +++ Query.pm.htmlelement 2010-06-28 22:22:50.154443229 -0400 @@ -28,6 +28,7 @@ bad_spec => 'Invalid specification "%s" in query: %s', is_empty => 'The query does not contain any elements', }; +use Scalar::Util qw(refaddr); our $SOURCES = { @@ -193,9 +194,15 @@ ' into args [', join(', ', @args), ']' ) if DEBUG; - # call look_down() against each element to get the new elements - @elements = map { $_->look_down(@args) } @elements; - + my @new_elements; + # HTML::Element has a bug where look_down() will return the same element + # again if it matches the arguments. This breaks queries like "div div p" + foreach my $el (@elements) { + my $addr = refaddr($el); + push(@new_elements, grep { refaddr($_) != $addr } $el->look_down(@args)); + } + @elements = @new_elements; + # so we can check we've done something $comops++; }
To add a test for this: Add this snippet as the last elements right before the closing body tag to html/test1.html <div> <div> <var>some var deep in some divs</var> </div> </div> Here's a test that can be added to t/query.t that can reproduce this problem and fails without the attached patch. my $vars = $query->query('div div var'); ok( $vars, 'got table tr.wibble td query' ); is( $vars->size, 1, 'on var in div div var query' ); is( join(', ', $vars->as_trimmed_text), 'some var deep in some divs', 'got var' );
Hi Michael, You are right about the duplicates - but we didn't want this dependency on the library so we found a slightly different way of deduping. You will get that improvement in .03. thanks, Kevin