Subject: | Problem in HTML::ElementTable |
Date: | Fri, 07 Aug 2009 19:33:09 +0200 |
To: | bug-HTML-Element-Extended [...] rt.cpan.org |
From: | "K. Wittrock" <KWittrock [...] web.de> |
The attached script demonstrates two problems with method as_text of HTML::ElementTable:
In the last col of the 2nd row, a blank is inserted in the middle of the text. Firefox an Internet Explorer display this text intact. So IMHO the HTML code of this cell should be considered as unusual, though either correct or tolerable.
In the following rows, as_text() apparently ignores the <br> tags, thus loosing the newlines of multiline text. This makes extraction of info unneccessarily complicated (and sometimes impossible).
Please contact me if you like to look at the original web page. Then I will send you a script to fetch this page with WWW::Mechanize.
I work with Windows XP SP3, perl v5.8.8 and HTML::ElementTable 1.17.
Kind regards
Klaus Wittrock
0 - 7 Uhr
Ortsgespräch Ferngespräch Alle Mobilfu nknetze *
01078
Call by Call
Minute: 0,51 Ct.
Takt: 60/60
01078
01078
Call by Call
Minute: 0,51 Ct.
Takt: 60/60
01078
0900531
Call by Call
Minute: 6,80 Ct.
Takt: 60/60
0900531
01013
Call-by-Call
Minute: 0,97 Ct.
Takt: 60/60
01013
01073
01073
Minute: 0,60 Ct.
Takt: 60/60
01073
01073
01073
Minute: 6,90 Ct.
Takt: 60/60
01073
Call by Call
Minute: 0,51 Ct.
Takt: 60/60
01078
01078
Call by Call
Minute: 0,51 Ct.
Takt: 60/60
01078
0900531
Call by Call
Minute: 6,80 Ct.
Takt: 60/60
0900531
01013
Call-by-Call
Minute: 0,97 Ct.
Takt: 60/60
01013
01073
01073
Minute: 0,60 Ct.
Takt: 60/60
01073
01073
01073
Minute: 6,90 Ct.
Takt: 60/60
01073
use strict;
use warnings;
use HTML::TreeBuilder;
use HTML::ElementTable;
my $file_name = 'demopage.html';
my $root = HTML::TreeBuilder->new_from_file($file_name);
my $tbl = $root->find('table');
my $eltbl = HTML::ElementTable->new_from_tree($tbl);
my @tbl_rows;
foreach (0 .. $eltbl->maxrow()) {
push @tbl_rows, $eltbl->row($_);
}
printrow($_) foreach @tbl_rows;
sub printrow{
my $zeil_ref = shift; # Type is HTML::ElementTable::RowGlob
my @zeile = map({ $_ || ''} $zeil_ref->as_text());
print "\nRow: @zeile\n";
print "Cells of this row:\n";
print " $_\n" foreach @zeile;
}
$root->delete();