Subject: | End of CDATA always escaped? |
Date: | Mon, 08 Jul 2013 14:11:12 +0200 |
To: | bug-XML-Twig [...] rt.cpan.org |
From: | Marco Pessotto <melmothx [...] gmail.com> |
It looks like that the end of CDATA is unconditionally escaped. See the
following test script:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Test::More;
plan tests => 3;
my $html = <<'EOF';
<div id="body">body</div>
<script>
//<![CDATA[
if ( this.value && ( !request.term || matcher.test(text) ) && 1 > 0 && 0 < 1 )
//]]>
</script>
EOF
my $parser = XML::Twig->new();
my $xml = $parser->safe_parse_html($html);
print $@ if $@;
my @cdata = $xml->get_xpath('#CDATA');
ok(@cdata > 0);
my @elts = $xml->get_xpath('//script');
foreach my $el (@elts) {
$el->set_asis;
diag $el->text;
ok(((index $el->text, "//]]>") >= 0), "end of cdata ok");
}
ok(((index $xml->sprint, "//]]>") >= 0), "end of cdata ok");
diag $xml->sprint;
__END__
Beside the fact that the CDATA is not parsed as such, probably because
of the HTML->XML conversion, but I can live with that, it seems that the
text marked as "AS IS" is escaped during the output.
The culprit seems to be line 8543 in the latest version:
if( ! $elt->{extra_data_in_pcdata})
{
$string=~ s/([$replaced_ents])/$XML::Twig::base_ent{$1}/g unless( !$replaced_ents || $keep_encoding || $elt->{asis});
$string=~ s{\Q]]>}{]]>}g; ### why is always replaced?
}
but I could be wrong.
Thanks in advance.
Best wishes
--
Marco