Bug #118014 for HTML-TableExtract: Can't get it to work

RT for rt.cpan.org

This queue is for tickets about the HTML-TableExtract CPAN distribution.

Report information

The Basics

Id:	118014
Status:	resolved
Priority:	0/
Queue:	HTML-TableExtract

People

Owner:	Nobody in particular
Requestors:	NHORNE [...] cpan.org
Cc:
AdminCc:

Bug Information

Severity:	Normal
Broken in:	2.13
Fixed in:	(no value)

History Show all quoted text

Wed Sep 21 08:51:55 2016 NHORNE [...] cpan.org - Ticket created

Subject:

Can't get it to work

Any idea why this fails? #!/usr/bin/env perl use strict; use warnings; use HTML::TableExtract; use LWP::Simple; my $html = get('http://www.janetandrichardsgenealogy.co.uk/ramsgate%20parish%20register%20extract%20-%20st_marys_baptisms.html'); my $te = HTML::TableExtract->new(headers => [qw(Baptised Born First Sex Last Parents Abode Trade Comments)]); $te->parse($html) || die "Can't parse"; foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row($ts->rows()) { print join(',', @{$row}), "\n";; } }

Wed Sep 21 13:13:04 2016 MSISK [...] cpan.org - Correspondence added

Looks like it's because the headers "Baptised" and "Born" are on the second row of the table, not on the first. Additionally you'd want "son" or the like instead of "Sex". However, this table is problematic because that leaves two columns with "Date" as the header...HTML::TableExtract doesn't support extracting columns with the same header. However, since you're grabbing all columns of the table, you don't actually need to specify headers at all...just create the table extract object like this: my $te = HTML::TableExtract->new(); That should work.

Wed Sep 21 13:13:04 2016 The RT System itself - Status changed from 'new' to 'open'

Wed Sep 21 13:13:04 2016 MSISK [...] cpan.org - Status changed from 'open' to 'resolved'