Skip Menu |

This queue is for tickets about the HTML-TableExtract CPAN distribution.

Report information
The Basics
Id: 118014
Status: resolved
Priority: 0/
Queue: HTML-TableExtract

People
Owner: Nobody in particular
Requestors: NHORNE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.13
Fixed in: (no value)



Subject: Can't get it to work
Any idea why this fails? #!/usr/bin/env perl use strict; use warnings; use HTML::TableExtract; use LWP::Simple; my $html = get('http://www.janetandrichardsgenealogy.co.uk/ramsgate%20parish%20register%20extract%20-%20st_marys_baptisms.html'); my $te = HTML::TableExtract->new(headers => [qw(Baptised Born First Sex Last Parents Abode Trade Comments)]); $te->parse($html) || die "Can't parse"; foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row($ts->rows()) { print join(',', @{$row}), "\n";; } }
Looks like it's because the headers "Baptised" and "Born" are on the second row of the table, not on the first. Additionally you'd want "son" or the like instead of "Sex". However, this table is problematic because that leaves two columns with "Date" as the header...HTML::TableExtract doesn't support extracting columns with the same header. However, since you're grabbing all columns of the table, you don't actually need to specify headers at all...just create the table extract object like this: my $te = HTML::TableExtract->new(); That should work.