Skip Menu |

This queue is for tickets about the HTML-TableContentParser CPAN distribution.

Report information
The Basics
Id: 1237
Status: resolved
Priority: 0/
Queue: HTML-TableContentParser

People
Owner: Nobody in particular
Requestors: zainul [...] ee.iitb.ac.in
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: 0.200_01



Subject: table-in-table bug
Hello, I found that while parsing some text within a data_cell, if a new table is encountered, the context of the data_cell and its corresponding table is overwritten by this new table, regardless of the fact that the original data cell has not ended. An example of this is : <table> <tr> <td> <p>This is some text which is extracted correctly</p> <table> <tr> <td> <p>This text is now part of a new table and is also extracted correctly</p> </td> </tr> </table> <p>This text part of the first td is actually lost</p> </td> </tr> </table> I've tried to solve this problem using a simple stack mechanism. Hope the usage is correct and the patch is helpful. Regards, Zainul.
*** TableContentParser.pm Tue Jun 11 21:34:03 2002 --- /usr/lib/perl5/site_perl/5.6.1/HTML/TableContentParser.pm Fri Jul 5 03:41:33 2002 *************** *** 11,20 **** --- 11,21 ---- our $VERSION = 0.11; our $DEBUG = 0; + our @tablestack; # The tags we're interested in. my @tag_names = qw(table tr td th); *************** *** 28,37 **** --- 29,39 ---- # Store the incoming details in the current 'object'. if ($tag eq 'table') { my $table = $attr; push @{$self->{STORE}->{tables}}, $table; + if (defined $self->{STORE}->{current_table}) { push @tablestack, $self->{STORE}->{current_table}; } $self->{STORE}->{current_table} = $table; } elsif ($tag eq 'th') { my $th = $attr; push @{$self->{STORE}->{current_table}->{headers}}, $th; $self->{STORE}->{current_header} = $th; *************** *** 74,87 **** $tag = lc($tag); return unless grep { $_ eq $tag } @tag_names; # Turn off the current object if ($tag eq 'table') { ! $self->{STORE}->{current_table} = undef; $self->{STORE}->{current_row} = undef; $self->{STORE}->{current_data_cell} = undef; $self->{STORE}->{current_header} = undef; } elsif ($tag eq 'th') { $self->{STORE}->{current_row} = undef; $self->{STORE}->{current_data_cell} = undef; $self->{STORE}->{current_header} = undef; } elsif ($tag eq 'tr') { --- 76,94 ---- $tag = lc($tag); return unless grep { $_ eq $tag } @tag_names; # Turn off the current object if ($tag eq 'table') { ! $self->{STORE}->{current_table} = pop @tablestack; $self->{STORE}->{current_row} = undef; $self->{STORE}->{current_data_cell} = undef; $self->{STORE}->{current_header} = undef; + if (defined $self->{STORE}->{current_table}) { + $self->{STORE}->{current_row} = ${$self->{STORE}->{current_table}->{rows}}[-1]; + $self->{STORE}->{current_data_cell} = ${$self->{STORE}->{current_row}->{cells}}[-1]; + $self->{STORE}->{current_header} = ${$self->{STORE}->{current_table}->{headers}}[-1]; + } } elsif ($tag eq 'th') { $self->{STORE}->{current_row} = undef; $self->{STORE}->{current_data_cell} = undef; $self->{STORE}->{current_header} = undef; } elsif ($tag eq 'tr') {