CC: | tobyc [...] strategicdata.com.au |
Subject: | Regression in parsing (X)HTML - test attached |
Version 1.67 introduced a regression in parsing HTML files. This issue
continues to exist in later versions.
See attached test, which parses some trivial XHTML and pulls a value out.
This test passes on 1.66, and fails on 1.67 to 1.69_2.
Subject: | parse_xhtml_tjc.t |
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 3;
use Test::Exception;
use XML::LibXML;
# Note - this test passes with XML::LibXML = 1.66, fails with 1.69, for me.
diag("XML::LibXML version = $XML::LibXML::VERSION");
my $html;
while (<DATA>) {
$html .= $_;
}
ok(length($html) > 100, "Loaded HTML sample");
my $parser = XML::LibXML->new;
my $doc;
lives_ok {
$doc = $parser->parse_html_string(
$html => { recover => 1, suppress_errors => 1 }
);
} "Can parse HTML string without dieing";
my $root = $doc->documentElement;
my $val = $root->findvalue('//input[@id="foo"]/@value');
is($val, 'working', 'Successfully retrieved value from document.');
__DATA__
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test & Test some more</title>
</head>
<body>
<p>Meet you at the café?</p>
<p>How about <a href="http://example.com?mode=cafe&id=1&ref=foo">this one</a>?
</p>
<input class="wibble" id="foo" value="working" />
</body>
</html>