Skip Menu |

This queue is for tickets about the HTML-Selector-XPath CPAN distribution.

Report information
The Basics
Id: 117127
Status: open
Priority: 0/
Queue: HTML-Selector-XPath

People
Owner: Nobody in particular
Requestors: kosmichal [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: selector_to_xpath wth css selector using :contains
Date: Mon, 22 Aug 2016 14:19:45 +0000
To: "bug-HTML-Selector-XPath [...] rt.cpan.org" <bug-HTML-Selector-XPath [...] rt.cpan.org>
From: Michal Kos <kosmichal [...] gmail.com>
Hi, There seems to be a problem when converting css selectors to xpath if they use :contains pseudoclass. Consider following html: ##### <label ><span >Title</span> <span >*</span> </label> ##### now if I try following css selector (say in Chrome console) "label:contains('Title')" it returns <label>...</label> if I use selector_to_xpath("label:contains('Title')") it returns following xpath: "//label[text()[contains(string(.),'Title')]]" trying to use this xpath results in empty result [] however if I modify the xpath and remove text() so the xpath is "//label[contains(string(.),'Title')]" it will return same element as css selector would do. The fix would be to edit HTML/Selector/XPath.pm lines 221 and 223 and change: push @parts, qq{[text()[contains(string(.),"$1")]]}; to push @parts, qq{[contains(string(.),"$1")]}; Could you please apply it? * Distrubution name: HTML-Selector-XPath 0.20 * Perl version: $ perl -v This is perl 5, version 20, subversion 2 (v5.20.2) built for x86_64-linux-gnu-thread-multi (with 81 registered patches, see perl -V for more detail) * OS: $ uname -a Linux deb-vm 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux ### test case ### use strict; use warnings; use HTML::Selector::XPath 'selector_to_xpath'; use Test::More tests => 1; my $css="label:contains('Title')"; my $xpath = selector_to_xpath($css); #//label[text()[contains(string(.),"Title")]] my $expected="//label[contains(string(.),'Title')]"; is($xpath,$expected,"selector_to_xpath(label:contains('Title'))"); ### test case ### ### html example ### <!DOCTYPE html> <html> <head> <title> test </title> <script src="/resources/jquery-3.1.0.min.js"></script> </head> <body> <label ><span >Title</span> <span >*</span> </label> </body> </html> ### html example ### Thank you.
Subject: Re: [rt.cpan.org #117127] selector_to_xpath wth css selector using :contains
Date: Tue, 30 Aug 2016 21:16:02 +0200
To: bug-HTML-Selector-XPath [...] rt.cpan.org
From: Max Maischein <corion [...] corion.net>
Hello Michal, thank you very much for analyzing the issue and sending a patch. Show quoted text
> There seems to be a problem when converting css selectors to xpath if they > use :contains pseudoclass.
I think part of the problem is that the :contains() selector was never really specified and is now deprecated. As implemented in HTML::Selector::XPath and its test suite, :contains only applies to the immediate node, not its child nodes. Show quoted text
> Consider following html: > > ##### > <label ><span >Title</span> > <span >*</span> </label> > ##### > > now if I try following css selector (say in Chrome console) > "label:contains('Title')" it returns <label>...</label>
Neither Firefox nor Chrome implement the :contains() selector natively . The jQuery documentation supports your usage case of selecting all nodes which themselves or whose children contain a given text [1], but it also claims that the text may even span nodes, which your approach does not support... Show quoted text
> Could you please apply it?
I have to think about a backwards compatible way that allows the also very useful old way of only selecting nodes whose immediate text contains the search text, so users have a simple way to adapt their queries to the changed semantics. Also, I have to see if/how jQuery supports text spanning across nodes and how it still matches that. My test case is the following HTML: <a href="Other"><p>Yes No No</p></a> <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> with this selector: a:contains("YesNo") And I expect the two following tags to match: <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> Basically this addition to t/02_html.t === --- input <a href="Other"><p>Yes No No</p></a> <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> --- selector a:contains("YesNo") --- expected <a href="No">Yes</a><a href="Yes">No</a> <a href="No"><p>Yes</p></a><a href="Yes">No</a> And currently, that fails and I'm not really sure how to fix it. -max [1] https://api.jquery.com/contains-selector/