Subject: | safe_parse_html fails to parse valid HTML with entities (in some cases) |
Date: | Tue, 02 Jul 2013 14:46:23 +0200 |
To: | bug-XML-Twig [...] rt.cpan.org |
From: | Marco Pessotto <melmothx [...] gmail.com> |
Hello there!
It looks like that entities (at least the very common '&') is
mangled if it's followed by a letter. The test script below illustrates
the problem, which contains perfectly valid HTML snippets.
While testing, I found that adding to the method "_html2xml" this option:
$tree->no_expand_entities(1);
seems to fix the problem, but I'm not sure at all it will not trigger
other problems or undesired behaviour.
It's also possible the bug resides in HTML::TreeBuilder, but this I
leave to you to decide.
Best wishes
Version used:
XML::Twig is up to date. (3.44)
HTML::TreeBuilder is up to date. (5.03)
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Test::More;
plan tests => 2;
my $parser = new XML::Twig ();
my $value =<< 'EOF';
<h1>Here&there</h1>
EOF
my $html = $parser->safe_parse_html($value);
print $@ if $@;
ok($html);
$value =<< 'EOF';
<h1>Here & there</h1>
EOF
$html = $parser->safe_parse_html($value);
print $@ if $@;
ok($html);
__END__
--
Marco