xiao-jun,
What's happening is that there's a row which contains data as well as
starts a new table, i.e.:
<tr>
<td> data </td>
<td> <table> .... </table> </td>
</tr>
The parser has to finish off the embedded table before it can finish
off the row. So, the data for the embedded table are returned before
the data for the enclosing row. There's no way around this,
unfortunately.
If you modify your row routine to print out the table id, you'll see
that the out-of-order rows are due to embedded tables, as described
above.
You should use the table id passed to row() to sort your data, or
use the class constructor method to create a new object per table.
diab
[xma@arcturusag.com - Tue Sep 3 12:20:29 2002]:
Show quoted text> Hi Diab,
>
> Thanks for the clarification. I also noticed that the row handlers are
> fired
> out of line order, for example with the file I attached before:
>
> 828: [:IPC Code:::::C07K 15/00;....]
> 849: [Aug. 1, 1985:DE1985003527568]
> 862: []
> 844: [:Priority Number::::::]
>
> by this code :
> sub row {
> my ( $tbl_id, $line_no, $data, $udata ) = @_;
> print STDERR "$line_no: ", "[", join(":", @$data), "]\n";
> }
>
> This can cause problems for getting the right table cell. Do you have
> any
> suggestions?
> (I had to save all rows then sort by line number before I use them)?
>
> Thanks,
>
> xiao-jun
>
>
>
>
> -----Original Message-----
> From: via RT [mailto:comment-HTML-TableParser@rt.cpan.org]
> Sent: Tuesday, September 03, 2002 8:10 AM
> To: Xiao-Jun Ma
> Subject: [cpan #1490] TableParser patch
>
>
> This message about HTML-TableParser was sent to you by DJERIUS via
> rt.cpan.org
>
> Full context and any attached attachments can be found at:
> <URL:
http://rt.cpan.org/NoAuth/Bug.html?id=1490 >
>
> Thanks for your bug report. The "real" problem is that the input HTML
> is malformed. There is an extra </table> tag in line 1016 of the input
> which caused the error. It's this line:
>
> 1016:</TABLE></TD></TR></TABLE></TD></TR></TABLE><A
> NAME="_BOTTOM"></A>
>
> I've modified the code to croak if there's an extra end table tag.
>
> I'll upload it to CPAN shortly.
>
> Diab
>
> [guest - Sat Aug 31 13:32:54 2002]:
>
> > The patch attached fixes the following errors, which show up only
> when
> > turning on "use diagnostics":
> > Uncaught exception from user code:
> > Uncaught exception from user code:
> > Modification of non-creatable array value attempted,
> subscript
> > -1 at /usr/lib/perl5/site_perl/5.6.1/HTML/TableParser.pm line
> 942.
> >
> HTML::TableParser::end_table('HTML::TableParser=HASH(0xa0210f4)',
> > undef, 1016) called at
> > /usr/lib/perl5/site_perl/5.6.1/HTML/TableParser.pm line 905
> > HTML::TableParser::end('HTML::TableParser=HASH(0xa0210f4)',
> > 'table', undef, 1016) called at
> > /usr/lib/perl5/site_perl/5.6.1/cygwin-multi/HTML/Parser.pm line
> 104
> > eval {...} called at /usr/lib/perl5/site_perl/5.6.1/cygwin-
> > multi/HTML/Parser.pm line 104
> >
> HTML::Parser::parse_file('HTML::TableParser=HASH(0xa0210f4)',
> > 'EP0210645_1') called at
> > /downloads/XiaoJunMa/SoftDev/Perl/scripts/parsePatent.pl line 33
> >
> HTML::Parser::parse_file('HTML::TableParser=HASH(0xa0210f4)',
> > 'EP0210645_1') called at
> > /downloads/XiaoJunMa/SoftDev/Perl/scripts/parsePatent.pl line 33