Date: | Sat, 2 Jul 2005 09:52:53 +0200 (MEST) |
From: | ddascalescu [...] gmail.com |
To: | bug-XML-Twig [...] rt.cpan.org |
Subject: | Forced conversion of the > base entity to '>' |
I was using XML::Twig 2005/06/28 09:53:02 and I saw that
when parsing and printing an XML, the > base entity is
always replaced in the output with '>'. Other base entities
such as < are not affected, and this lack of symmetry
makes me wonder why this happens.
I tried correcting the problem by using keep_encoding => 1,
but that has the unfortunate effect of not parsing UTF-8.
As far as I know, there is no way to still parse UTF-8 and
*not* convert > to '>'.
Please see the patch. It's a dirty hack, I did not have the
time to dig in the module and understand its logic, so it
might be wrong.
Here is a test case:
#! perl -w
use strict;
use XML::Twig;
my $t= XML::Twig->new();
$t->parse( '<start>3 < pi > 3</start>' );
$t->print;
And here is the patch:
--- D:\Perl\site\lib\XML\Twig318.pm 7/2/2005 00:27:14
+++ D:\Perl\site\lib\XML\Twig318.orig 7/2/2005 00:21:16
@@ -6810,6816 +6810,6816 @@
}
}
else
- { $string=~ s/([&<>])/$XML::Twig::base_ent{$1}/g unless( $keep_encoding || $elt->{asis});
+ { $string=~ s/([&<])/$XML::Twig::base_ent{$1}/g unless( $keep_encoding || $elt->{asis});
$string=~ s{\Q]]>}{]]>}g;
}
return $output_text_filter ? $output_text_filter->( $string) : $string;
Hope this helps,
Dan Dascalescu