Subject: | _tag_walk does not handle nested tables |
Date: | Mon, 29 Jan 2007 17:47:50 -0500 |
To: | bug-Test-WWW-Mechanize [...] rt.cpan.org |
From: | Jeff Boes <jeff [...] endpoint.com> |
In Test::WWW::Mechanize, I have identified what may be a bug in the
implementation of "has_tag".
$ perl -MTest::WWW::Mechanize -e 'print $Test::WWW::Mechanize::VERSION, qq(\n)'
1.12
$ perl -v
This is perl, v5.8.5 built for i686-linux
...
$ uname -a
Linux rs7.endpoint.com 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 13 17:56:20 EDT 2006 i686 i686 i386 GNU/Linux
What I've found: my test script fetches a page that contains this fragment of HTML --
Show quoted text
My apologies for the indentation of it, but it's generated by another program. I don't think the indentation affects it.
The possible bug comes in this test case call:
Show quoted text
Where it seems to fail is in matching the outer "<td>" tag
(above, it's the "class=form_cell" one), then extracting text up to the
matching "</td>" tag which follows the "<td class=even>"
tag. Thus it never gets to match the inner <td> against its
proper end-tag.
I altered the routine "_tag_walk" to this:
while ( my $token = $p->get_tag( $tag ) ) {
my $tagtext = $p->get_trimmed_text(); # was $p->get_trimmed_text( "/$tag" );
return 1 if $match->( $tagtext );
}
which seems to work properly.
$ perl -MTest::WWW::Mechanize -e 'print $Test::WWW::Mechanize::VERSION, qq(\n)'
1.12
$ perl -v
This is perl, v5.8.5 built for i686-linux
...
$ uname -a
Linux rs7.endpoint.com 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 13 17:56:20 EDT 2006 i686 i686 i386 GNU/Linux
What I've found: my test script fetches a page that contains this fragment of HTML --
Show quoted text
<td class="form_cell"><h3>Show all users and groups</h3> <table id="admin_form"> <tr> <th>User</th> <th>Groups</th> </tr> <tr> <td class="even"> company </td> <td class="even">company,email,employee,website
</td>
Show quoted text
</tr>
My apologies for the indentation of it, but it's generated by another program. I don't think the indentation affects it.
The possible bug comes in this test case call:
Show quoted text
$mech->has_tag('td', 'company', 'User company');
I altered the routine "_tag_walk" to this:
while ( my $token = $p->get_tag( $tag ) ) {
my $tagtext = $p->get_trimmed_text(); # was $p->get_trimmed_text( "/$tag" );
return 1 if $match->( $tagtext );
}
which seems to work properly.