Bug #18705 for XML-XPathEngine: Unexpected sorting of XPathEngine

Thu Apr 13 16:37:53 2006 Guest - Ticket created

Subject:

Unexpected sorting of XPathEngine

In XML::XPathEngine::find, there is sort->remove_duplicates for NodeSet:s. I think I understand why this is being done, but it seems to break at least one page I'm looking at (and breaks it very consistently). You appear to be sorting on the memory location of the item in the NodeSet. Consider this code: --- use LWP::Simple; use HTML::TreeBuilder::XPath; my $page = get( "http://rdu.news14.com/content/weather/7day_forecast/" ); my $tree = HTML::TreeBuilder::XPath->new_from_content( $page ); my $nodes = $tree->findnodes( '//b' ); print $nodes; --- I've attached the specific version of index.html that causes the problem. The issue is that the data in the HTML (and in $tree) is in a different order than the information in $nodes. The data in the first two rows of the forecast table wind up at the end of the NodeList. Is the remove_duplicates actually necessary? (I've removed it without seeing immediate problems.) If so, could you remove duplicates without sorting (this would probably be faster, though it might take a little more memory to hold a %seen hash)?

Subject:

index.html

Message body is not shown because it is too large.

Thu Apr 20 08:46:49 2006 MIROD [...] cpan.org - Correspondence added

Hi, The bug is not in XML::XPathEngine, it is in HTML::TreeBuilder::XPath, the comparison method had a cmp instead of a <=>, which caused the problem you had. I have put an updated version on HTML::TreeBuilder::XPath, at http://www.xmltwig.com/module/html-treebuilder-xpath/ let me know if it works better for you, in which case I will upload it to CPAN. Thanks __ mirod

Thu Apr 20 08:46:50 2006 The RT System itself - Status changed from 'new' to 'open'

Thu Apr 20 12:23:40 2006 Guest - Correspondence added

The new version of HTML::TreeBuilder::XPath seems to fix the problem. Thanks.

Thu Apr 20 12:31:03 2006 MIROD [...] cpan.org - Correspondence added

On Thu Apr 20 12:23:40 2006, guest wrote: Show quoted text

> The new version of HTML::TreeBuilder::XPath seems to fix the problem. > Thanks.

Thanks, I just uploaded HTML-TreeBuilder-XPath-0.03 to CPAN __ mirod

Fri Apr 21 00:51:46 2006 MIROD [...] cpan.org - Status changed from 'open' to 'resolved'

Fri Apr 21 00:51:47 2006 MIROD [...] cpan.org - Broken in 0.03 deleted