Subject: | Unexpected sorting of XPathEngine |
In XML::XPathEngine::find, there is sort->remove_duplicates for
NodeSet:s. I think I understand why this is being done, but it seems to
break at least one page I'm looking at (and breaks it very
consistently). You appear to be sorting on the memory location of the
item in the NodeSet. Consider this code:
---
use LWP::Simple;
use HTML::TreeBuilder::XPath;
my $page = get( "http://rdu.news14.com/content/weather/7day_forecast/" );
my $tree = HTML::TreeBuilder::XPath->new_from_content( $page );
my $nodes = $tree->findnodes( '//b' );
print $nodes;
---
I've attached the specific version of index.html that causes the
problem. The issue is that the data in the HTML (and in $tree) is in a
different order than the information in $nodes. The data in the first
two rows of the forecast table wind up at the end of the NodeList.
Is the remove_duplicates actually necessary? (I've removed it without
seeing immediate problems.) If so, could you remove duplicates without
sorting (this would probably be faster, though it might take a little
more memory to hold a %seen hash)?
Subject: | index.html |
Message body is not shown because it is too large.