Skip Menu |

This queue is for tickets about the HTML-Selector-XPath CPAN distribution.

Report information
The Basics
Id: 81735
Status: rejected
Priority: 0/
Queue: HTML-Selector-XPath

People
Owner: Nobody in particular
Requestors: parlay [...] yopmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: workaround bug in HTML::TreeBuilder::XPath
As reported here: https://rt.cpan.org/Public/Bug/Display.html?id=81722 HTML::TreeBuilder lowercases the attribute names in the resulting HTML tree, but HTML::TreeBuilder::XPath doesn't bother to lowercase the attribute names in the selector, so if the user tries to select based upon the original-cased version of the attribute name, and it was anything other than all-lowercase, the match would fail. The author of HTML::TreeBuilder::XPath apparently doesn't think this is a bug, as he thinks it will suffice to document this in HTML::TreeBuilder, which isn't going to help somebody using a higher-level module, like Web::Scraper. Here is a test case demonstrating the issue in HTML::TreeBuilder::XPath: https://rt.cpan.org/Ticket/Attachment/1149252/604410/d.pl This can be worked around in HTML::Selector::XPath with the attached patch.
Subject: lcattr.diff
diff -Naur lib/HTML/Selector/XPath.pm /tmp/lib/HTML/Selector/XPath.pm --- lib/HTML/Selector/XPath.pm 2012-10-01 17:18:02.000000000 +0000 +++ /tmp/lib/HTML/Selector/XPath.pm 2012-12-06 06:36:07.000000000 +0000 @@ -50,6 +50,8 @@ sub convert_attribute_match { my ($left,$op,$right) = @_; + $left = lc $left; + # negation (e.g. [input!="text"]) isn't implemented in CSS, but include it anyway: if ($op eq '!=') { "\@$left!='$right'"; @@ -166,7 +168,7 @@ push @parts, '*'; $tag_index = $#parts; }; - push @parts, "[\@$1]"; + push @parts, "[\@\L$1]"; } elsif ($rule =~ $reg->{badattr}) { Carp::croak "Invalid attribute-value selector '$rule'"; } @@ -177,7 +179,7 @@ if ($sub_rule =~ s/$reg->{attr2}//) { push @parts, "[not(", convert_attribute_match( $1, $2, $^N ), ")]"; } elsif ($sub_rule =~ s/$reg->{attr1}//) { - push @parts, "[not(\@$1)]"; + push @parts, "[not(\@\L$1)]"; } elsif ($rule =~ $reg->{badattr}) { Carp::croak "Invalid attribute-value selector '$rule'"; } else {
Subject: Re: [rt.cpan.org #81735] workaround bug in HTML::TreeBuilder::XPath
Date: Wed, 5 Dec 2012 22:41:23 -0800
To: "bug-HTML-Selector-XPath [...] rt.cpan.org" <bug-HTML-Selector-XPath [...] rt.cpan.org>
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
use github https://github.com/miyagawa/HTML-Selector-XPath and make a pull request there. On Wed, Dec 5, 2012 at 10:39 PM, parlay via RT < bug-HTML-Selector-XPath@rt.cpan.org> wrote: Show quoted text
> Thu Dec 06 01:39:44 2012: Request 81735 was acted upon. > Transaction: Ticket created by parlay > Queue: HTML-Selector-XPath > Subject: workaround bug in HTML::TreeBuilder::XPath > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: parlay@yopmail.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=81735 > > > > As reported here: https://rt.cpan.org/Public/Bug/Display.html?id=81722 > > HTML::TreeBuilder lowercases the attribute names in the resulting HTML > tree, but HTML::TreeBuilder::XPath doesn't bother to lowercase the > attribute names in the selector, so if the user tries to select based > upon the original-cased version of the attribute name, and it was > anything other than all-lowercase, the match would fail. > > The author of HTML::TreeBuilder::XPath apparently doesn't think this is > a bug, as he thinks it will suffice to document this in > HTML::TreeBuilder, which isn't going to help somebody using a > higher-level module, like Web::Scraper. > > Here is a test case demonstrating the issue in HTML::TreeBuilder::XPath: > https://rt.cpan.org/Ticket/Attachment/1149252/604410/d.pl > > This can be worked around in HTML::Selector::XPath with the attached patch. >
-- Tatsuhiko Miyagawa
From: parlay [...] yopmail.com
On Thu Dec 06 01:41:55 2012, miyagawa@gmail.com wrote: Show quoted text
> use github https://github.com/miyagawa/HTML-Selector-XPath and make a > pull > request there.
I'll copy this over there, but you should update the distribution metadata to point to github issues for the bugtracker, because metacpan and search.cpan.org links to rt, which is why I posted here.
As per discussion in https://github.com/miyagawa/HTML-Selector-XPath/issues/12 , this module is the wrong place to fix this.