Subject: | table-in-table bug |
Hello,
I found that while parsing some text within a data_cell, if a new table is encountered, the context of the data_cell and its corresponding table is overwritten by this new table, regardless of the fact that the original data cell has not ended.
An example of this is :
<table>
<tr>
<td>
<p>This is some text which is extracted correctly</p>
<table>
<tr>
<td>
<p>This text is now part of a new table and is also extracted correctly</p>
</td>
</tr>
</table>
<p>This text part of the first td is actually lost</p>
</td>
</tr>
</table>
I've tried to solve this problem using a simple stack mechanism. Hope the usage is correct and the patch is helpful.
Regards,
Zainul.
*** TableContentParser.pm Tue Jun 11 21:34:03 2002
--- /usr/lib/perl5/site_perl/5.6.1/HTML/TableContentParser.pm Fri Jul 5 03:41:33 2002
***************
*** 11,20 ****
--- 11,21 ----
our $VERSION = 0.11;
our $DEBUG = 0;
+ our @tablestack;
# The tags we're interested in.
my @tag_names = qw(table tr td th);
***************
*** 28,37 ****
--- 29,39 ----
# Store the incoming details in the current 'object'.
if ($tag eq 'table') {
my $table = $attr;
push @{$self->{STORE}->{tables}}, $table;
+ if (defined $self->{STORE}->{current_table}) { push @tablestack, $self->{STORE}->{current_table}; }
$self->{STORE}->{current_table} = $table;
} elsif ($tag eq 'th') {
my $th = $attr;
push @{$self->{STORE}->{current_table}->{headers}}, $th;
$self->{STORE}->{current_header} = $th;
***************
*** 74,87 ****
$tag = lc($tag);
return unless grep { $_ eq $tag } @tag_names;
# Turn off the current object
if ($tag eq 'table') {
! $self->{STORE}->{current_table} = undef;
$self->{STORE}->{current_row} = undef;
$self->{STORE}->{current_data_cell} = undef;
$self->{STORE}->{current_header} = undef;
} elsif ($tag eq 'th') {
$self->{STORE}->{current_row} = undef;
$self->{STORE}->{current_data_cell} = undef;
$self->{STORE}->{current_header} = undef;
} elsif ($tag eq 'tr') {
--- 76,94 ----
$tag = lc($tag);
return unless grep { $_ eq $tag } @tag_names;
# Turn off the current object
if ($tag eq 'table') {
! $self->{STORE}->{current_table} = pop @tablestack;
$self->{STORE}->{current_row} = undef;
$self->{STORE}->{current_data_cell} = undef;
$self->{STORE}->{current_header} = undef;
+ if (defined $self->{STORE}->{current_table}) {
+ $self->{STORE}->{current_row} = ${$self->{STORE}->{current_table}->{rows}}[-1];
+ $self->{STORE}->{current_data_cell} = ${$self->{STORE}->{current_row}->{cells}}[-1];
+ $self->{STORE}->{current_header} = ${$self->{STORE}->{current_table}->{headers}}[-1];
+ }
} elsif ($tag eq 'th') {
$self->{STORE}->{current_row} = undef;
$self->{STORE}->{current_data_cell} = undef;
$self->{STORE}->{current_header} = undef;
} elsif ($tag eq 'tr') {