Skip Menu |

This queue is for tickets about the HTML-Query CPAN distribution.

Report information
The Basics
Id: 77877
Status: resolved
Priority: 0/
Queue: HTML-Query

People
Owner: Nobody in particular
Requestors: spt [...] jobindex.dk
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.07
Fixed in: (no value)



Subject: Support for :first-child and :last-child
Currently HTML::Query doesn't support pseudo-class selectors such as :first-child and :last-child. Attached is a patch that adds support for these two aforementioned pseudo-classes.
Subject: html-query-pseudoclasses.diff
diff -ru --new-file lib/HTML/Query.pm lib/HTML/Query.pm --- lib/HTML/Query.pm 2010-10-30 02:46:00.000000000 +0200 +++ lib/HTML/Query.pm 2012-06-18 14:29:06.308560192 +0200 @@ -271,6 +271,20 @@ push( @args, $attribute => qr/.*/ ); } } + # and/or one or more pseudo-classes + if ($query =~ / \G : ([\w\-]+) /cgx) { + my $pseudoclass = $1; + $specificity += 10; + + if ($pseudoclass eq 'first-child') { + push( @args, sub { ! $_[0]->left() } ); + } elsif ($pseudoclass eq 'last-child') { + push( @args, sub { ! $_[0]->right() } ); + } else { + warn "Pseudoclass :$pseudoclass not supported"; + next; + } + } # keep going until this particular expression is fully processed last unless scalar(@args) > $work; diff -ru --new-file t/html/pseudoclasses.html t/html/pseudoclasses.html --- t/html/pseudoclasses.html 1970-01-01 01:00:00.000000000 +0100 +++ t/html/pseudoclasses.html 2012-06-18 14:23:51.912735725 +0200 @@ -0,0 +1,23 @@ +<html> + <head> + </head> + <body> + <table> + <tr> + <td>1,1</td> + <td>1,2</td> + <td>1,3</td> + </tr> + <tr> + <td>2,1</td> + <td>2,2</td> + <td>2,3</td> + </tr> + <tr> + <td>3,1</td> + <td>3,2</td> + <td>3,3</td> + </tr> + </table> + </body> +</html> diff -ru --new-file t/pseudoclasses.t t/pseudoclasses.t --- t/pseudoclasses.t 1970-01-01 01:00:00.000000000 +0100 +++ t/pseudoclasses.t 2012-06-18 14:30:09.889736731 +0200 @@ -0,0 +1,42 @@ +use strict; +use warnings; +use lib qw( ./lib ../lib ); +use HTML::TreeBuilder; +use Badger::Filesystem '$Bin Dir'; +use Badger::Test + tests => 10, + debug => 'HTML::Query', + args => \@ARGV; + +use HTML::Query 'Query'; + +our $Query = 'HTML::Query'; +our $Builder = 'HTML::TreeBuilder'; +our $test_dir = Dir($Bin); +our $html_dir = $test_dir->dir('html')->must_exist; +our $pseudo = $html_dir->file('pseudoclasses.html')->must_exist; + +my ($query, $tree); + +$tree = $Builder->new; +$tree->parse_file( $pseudo->absolute ); + +ok( $tree, 'parsed tree for test file: ' . $pseudo->name ); +$query = Query $tree; +ok( $query, 'created query' ); + +my $test1 = $query->query('table td:first-child'); +is( $test1->size, 3, 'test1 - size' ); +is( join(" | ", $test1->as_trimmed_text), "1,1 | 2,1 | 3,1", 'test1 - text'); + +my $test2 = $query->query('table td:last-child'); +is( $test2->size, 3, 'test2 - size' ); +is( join(" | ", $test2->as_trimmed_text), "1,3 | 2,3 | 3,3", 'test2 - text'); + +my $test3 = $query->query('table tr:first-child td'); +is( $test3->size, 3, 'test3 - size' ); +is( join(" | ", $test3->as_trimmed_text), "1,1 | 1,2 | 1,3", 'test3 - text'); + +my $test4 = $query->query('table tr:last-child td'); +is( $test4->size, 3, 'test4 - size' ); +is( join(" | ", $test4->as_trimmed_text), "3,1 | 3,2 | 3,3", 'test4 - text'); diff -ru --new-file t/specificity.t t/specificity.t --- t/specificity.t 2010-10-30 02:43:57.000000000 +0200 +++ t/specificity.t 2012-06-18 14:32:14.212035982 +0200 @@ -14,7 +14,7 @@ use lib qw( ./lib ../lib ); use Badger::Filesystem '$Bin Dir'; use Badger::Test - tests => 23, + tests => 24, debug => 'HTML::Query', args => \@ARGV; @@ -44,7 +44,8 @@ "*" => 0, "div#id-one div p>em" => 104, "html#simple body#internal" => 202, - "body#internal" => 101 + "body#internal" => 101, + "div:first-child" => 11, ); foreach my $rule (keys %rules) {
Hi, Thanks so much for the patch - I did receive it, I'm just in the middle of a release cycle for my full time job. I will be incorporating this, testing it and probably releasing it after OSCON - probably in about 3 weeks. Can I ask where and how you're using the library? thanks, Kevin On Mon Jun 18 09:12:24 2012, spaaske wrote: Show quoted text
> Currently HTML::Query doesn't support pseudo-class selectors such as > :first-child and :last-child. > > Attached is a patch that adds support for these two aforementioned > pseudo-classes.
Also do you want attribution? If so please specify what you would like included. thanks, Kevin On Mon Jun 18 09:12:24 2012, spaaske wrote: Show quoted text
> Currently HTML::Query doesn't support pseudo-class selectors such as > :first-child and :last-child. > > Attached is a patch that adds support for these two aforementioned > pseudo-classes.
From: spt [...] jobindex.dk
On Sun Jul 15 13:47:35 2012, KAMELKEV wrote: Show quoted text
> Hi, > > Thanks so much for the patch - I did receive it, I'm just in the > middle of a release cycle for my full time job. > > I will be incorporating this, testing it and probably releasing it > after OSCON - probably in about 3 weeks.
Excellent to hear. I've made a slightly changed patch, as my first patch allowed for text nodes to be first and last children. The changed patch is attached. Show quoted text
> Can I ask where and how you're using the library?
We're indirectly using it here at Jobindex through CSS::Inliner, and found ourselves needing to inline a :first-child and :last-child. Show quoted text
> Also do you want attribution? If so please specify what you would like > included.
If you wish, you can attribute me as "Sebastian Paaske Tørholm <spt@jobindex.dk>".
Subject: html-query-pseudoclasses.diff
diff -ru --new-file lib/HTML/Query.pm lib/HTML/Query.pm --- lib/HTML/Query.pm 2010-10-30 02:46:00.000000000 +0200 +++ lib/HTML/Query.pm 2012-06-18 14:29:06.308560192 +0200 @@ -271,6 +271,20 @@ push( @args, $attribute => qr/.*/ ); } } + # and/or one or more pseudo-classes + if ($query =~ / \G : ([\w\-]+) /cgx) { + my $pseudoclass = $1; + $specificity += 10; + + if ($pseudoclass eq 'first-child') { + push( @args, sub { ! { ref $_ } $_[0]->left() } ); + } elsif ($pseudoclass eq 'last-child') { + push( @args, sub { ! { ref $_ } $_[0]->right() } ); + } else { + warn "Pseudoclass :$pseudoclass not supported"; + next; + } + } # keep going until this particular expression is fully processed last unless scalar(@args) > $work; diff -ru --new-file t/html/pseudoclasses.html t/html/pseudoclasses.html --- t/html/pseudoclasses.html 1970-01-01 01:00:00.000000000 +0100 +++ t/html/pseudoclasses.html 2012-06-18 14:23:51.912735725 +0200 @@ -0,0 +1,23 @@ +<html> + <head> + </head> + <body> + <table> + <tr> + <td>1,1</td> + <td>1,2</td> + <td>1,3</td> + </tr> + <tr> + <td>2,1</td> + <td>2,2</td> + <td>2,3</td> + </tr> + <tr> + <td>3,1</td> + <td>3,2</td> + <td>3,3</td> + </tr> + </table> + </body> +</html> diff -ru --new-file t/pseudoclasses.t t/pseudoclasses.t --- t/pseudoclasses.t 1970-01-01 01:00:00.000000000 +0100 +++ t/pseudoclasses.t 2012-06-18 14:30:09.889736731 +0200 @@ -0,0 +1,42 @@ +use strict; +use warnings; +use lib qw( ./lib ../lib ); +use HTML::TreeBuilder; +use Badger::Filesystem '$Bin Dir'; +use Badger::Test + tests => 10, + debug => 'HTML::Query', + args => \@ARGV; + +use HTML::Query 'Query'; + +our $Query = 'HTML::Query'; +our $Builder = 'HTML::TreeBuilder'; +our $test_dir = Dir($Bin); +our $html_dir = $test_dir->dir('html')->must_exist; +our $pseudo = $html_dir->file('pseudoclasses.html')->must_exist; + +my ($query, $tree); + +$tree = $Builder->new; +$tree->parse_file( $pseudo->absolute ); + +ok( $tree, 'parsed tree for test file: ' . $pseudo->name ); +$query = Query $tree; +ok( $query, 'created query' ); + +my $test1 = $query->query('table td:first-child'); +is( $test1->size, 3, 'test1 - size' ); +is( join(" | ", $test1->as_trimmed_text), "1,1 | 2,1 | 3,1", 'test1 - text'); + +my $test2 = $query->query('table td:last-child'); +is( $test2->size, 3, 'test2 - size' ); +is( join(" | ", $test2->as_trimmed_text), "1,3 | 2,3 | 3,3", 'test2 - text'); + +my $test3 = $query->query('table tr:first-child td'); +is( $test3->size, 3, 'test3 - size' ); +is( join(" | ", $test3->as_trimmed_text), "1,1 | 1,2 | 1,3", 'test3 - text'); + +my $test4 = $query->query('table tr:last-child td'); +is( $test4->size, 3, 'test4 - size' ); +is( join(" | ", $test4->as_trimmed_text), "3,1 | 3,2 | 3,3", 'test4 - text'); diff -ru --new-file t/specificity.t t/specificity.t --- t/specificity.t 2010-10-30 02:43:57.000000000 +0200 +++ t/specificity.t 2012-06-18 14:32:14.212035982 +0200 @@ -14,7 +14,7 @@ use lib qw( ./lib ../lib ); use Badger::Filesystem '$Bin Dir'; use Badger::Test - tests => 23, + tests => 24, debug => 'HTML::Query', args => \@ARGV; @@ -44,7 +44,8 @@ "*" => 0, "div#id-one div p>em" => 104, "html#simple body#internal" => 202, - "body#internal" => 101 + "body#internal" => 101, + "div:first-child" => 11, ); foreach my $rule (keys %rules) {
From: spt [...] jobindex.dk
On Mon Jul 16 03:46:09 2012, spaaske wrote: Show quoted text
> I've made a slightly changed patch, as my first patch allowed for text > nodes to be first and last children. The changed patch is attached.
Apologies, attached the wrong patch. This is the correct one.
Subject: html-query-pseudoclasses.diff
diff -ru --new-file lib/HTML/Query.pm lib/HTML/Query.pm --- lib/HTML/Query.pm 2010-10-30 02:46:00.000000000 +0200 +++ lib/HTML/Query.pm 2012-06-18 14:29:06.308560192 +0200 @@ -271,6 +271,20 @@ push( @args, $attribute => qr/.*/ ); } } + # and/or one or more pseudo-classes + if ($query =~ / \G : ([\w\-]+) /cgx) { + my $pseudoclass = $1; + $specificity += 10; + + if ($pseudoclass eq 'first-child') { + push( @args, sub { ! grep { ref $_ } $_[0]->left() } ); + } elsif ($pseudoclass eq 'last-child') { + push( @args, sub { ! grep { ref $_ } $_[0]->right() } ); + } else { + warn "Pseudoclass :$pseudoclass not supported"; + next; + } + } # keep going until this particular expression is fully processed last unless scalar(@args) > $work; diff -ru --new-file t/html/pseudoclasses.html t/html/pseudoclasses.html --- t/html/pseudoclasses.html 1970-01-01 01:00:00.000000000 +0100 +++ t/html/pseudoclasses.html 2012-06-18 14:23:51.912735725 +0200 @@ -0,0 +1,23 @@ +<html> + <head> + </head> + <body> + <table> + <tr> + <td>1,1</td> + <td>1,2</td> + <td>1,3</td> + </tr> + <tr> + <td>2,1</td> + <td>2,2</td> + <td>2,3</td> + </tr> + <tr> + <td>3,1</td> + <td>3,2</td> + <td>3,3</td> + </tr> + </table> + </body> +</html> diff -ru --new-file t/pseudoclasses.t t/pseudoclasses.t --- t/pseudoclasses.t 1970-01-01 01:00:00.000000000 +0100 +++ t/pseudoclasses.t 2012-06-18 14:30:09.889736731 +0200 @@ -0,0 +1,42 @@ +use strict; +use warnings; +use lib qw( ./lib ../lib ); +use HTML::TreeBuilder; +use Badger::Filesystem '$Bin Dir'; +use Badger::Test + tests => 10, + debug => 'HTML::Query', + args => \@ARGV; + +use HTML::Query 'Query'; + +our $Query = 'HTML::Query'; +our $Builder = 'HTML::TreeBuilder'; +our $test_dir = Dir($Bin); +our $html_dir = $test_dir->dir('html')->must_exist; +our $pseudo = $html_dir->file('pseudoclasses.html')->must_exist; + +my ($query, $tree); + +$tree = $Builder->new; +$tree->parse_file( $pseudo->absolute ); + +ok( $tree, 'parsed tree for test file: ' . $pseudo->name ); +$query = Query $tree; +ok( $query, 'created query' ); + +my $test1 = $query->query('table td:first-child'); +is( $test1->size, 3, 'test1 - size' ); +is( join(" | ", $test1->as_trimmed_text), "1,1 | 2,1 | 3,1", 'test1 - text'); + +my $test2 = $query->query('table td:last-child'); +is( $test2->size, 3, 'test2 - size' ); +is( join(" | ", $test2->as_trimmed_text), "1,3 | 2,3 | 3,3", 'test2 - text'); + +my $test3 = $query->query('table tr:first-child td'); +is( $test3->size, 3, 'test3 - size' ); +is( join(" | ", $test3->as_trimmed_text), "1,1 | 1,2 | 1,3", 'test3 - text'); + +my $test4 = $query->query('table tr:last-child td'); +is( $test4->size, 3, 'test4 - size' ); +is( join(" | ", $test4->as_trimmed_text), "3,1 | 3,2 | 3,3", 'test4 - text'); diff -ru --new-file t/specificity.t t/specificity.t --- t/specificity.t 2010-10-30 02:43:57.000000000 +0200 +++ t/specificity.t 2012-06-18 14:32:14.212035982 +0200 @@ -14,7 +14,7 @@ use lib qw( ./lib ../lib ); use Badger::Filesystem '$Bin Dir'; use Badger::Test - tests => 23, + tests => 24, debug => 'HTML::Query', args => \@ARGV; @@ -44,7 +44,8 @@ "*" => 0, "div#id-one div p>em" => 104, "html#simple body#internal" => 202, - "body#internal" => 101 + "body#internal" => 101, + "div:first-child" => 11, ); foreach my $rule (keys %rules) {
Hi, I have applied the patch, tested it. Everything looks good. It's already imported to CPAN so we just need it to distribute. CSS2 didn't really have too many other pseudo-classes that aren't implemented (:lang, :link), although it has a few pseudo-elements I need to look into. CSS3 on the other hand defined quite a few new pseudo-classes and pseudo-elements. Enough that it's well beyond my capacity, so the project will probably be reliant on outside patches to iteratively improve support there. Thanks again for the patch. I wrote CSS::Inliner, so this is good for giving us more support during the inline process. thanks again, Kevin On Mon Jul 16 03:48:22 2012, spaaske wrote: Show quoted text
> On Mon Jul 16 03:46:09 2012, spaaske wrote:
> > I've made a slightly changed patch, as my first patch allowed for text > > nodes to be first and last children. The changed patch is attached.
> > Apologies, attached the wrong patch. This is the correct one.