Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 38163
Status: rejected
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: EDAVIS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: $twig->sprint produces badly formed XML because of entity bug (test case included)
XML::Twig does not handle the document <a>&gt;0</a> correctly. If you parse it and then write it out, it gets turned to <a>>0</a>. Attached is a patch adding a test case for this bug.
Subject: diff
Download diff
application/octet-stream 857b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #38163] $twig->sprint produces badly formed XML because of entity bug (test case included)
Date: Mon, 4 Aug 2008 18:03:29 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: "Michel Rodriguez" <xmltwig [...] gmail.com>
On Mon, Aug 4, 2008 at 2:02 PM, EDAVIS via RT <bug-XML-Twig@rt.cpan.org> wrote: Show quoted text
> Mon Aug 04 08:02:41 2008: Request 38163 was acted upon. > Transaction: Ticket created by EDAVIS > Queue: XML-Twig > Subject: $twig->sprint produces badly formed XML because of entity bug > (test case included) > Broken in: (no value) > Severity: Important > Owner: Nobody > Requestors: EDAVIS@cpan.org > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=38163 > > > > XML::Twig does not handle the document <a>&gt;0</a> correctly. If you > parse it and then write it out, it gets turned to <a>>0</a>. Attached > is a patch adding a test case for this bug.
Actually <a>>0</a> is a valid XML document. Did you try xmlwf or xmllint on it? < always needs to be escaped, but > doesn't. -- mirod
Show quoted text
>Actually <a>>0</a> is a valid XML document.
Oh, I didn't realize that. Can this bug report become a feature request, then: in character data, > should always be escaped as &gt;. I would make that the default, but if you feel that getting the shortest possible output is important even if it looks a bit weird, then at least the various pretty-printing options should escape > all the time.
From: trochee
On Thu Aug 07 05:58:01 2008, EDAVIS wrote: Show quoted text
> >Actually <a>>0</a> is a valid XML document.
> > Oh, I didn't realize that. > > Can this bug report become a feature request, then: in character data, > > should always be escaped as &gt;. > > I would make that the default, but if you feel that getting the shortest > possible output is important even if it looks a bit weird, then at least > the various pretty-printing options should escape > all the time.
More to the point -- this valid XML document can't be parsed by XML::Twig, which means that the round-trip to XML and back is broken. It might be an XML::Parser bug, I suppose.
From: trochee
Show quoted text
> More to the point -- this valid XML document can't be parsed by > XML::Twig, which means that the round-trip to XML and back is broken. > It might be an XML::Parser bug, I suppose.
scratch that, that XML *can* be parsed by XML::Twig (and XML::Parser). Nothing wrong here. It might be nice to have some way to tell ->sprint() to additionally escape some characters (e.g. &rt;) though. But my sense is that the priority may be much lower.
Specifying a twig_handler may be an option. my $xml = '<?xml version="1.0"?><document><a>>0</a></document>'; my $xTwig = XML::Twig->new( twig_handlers => { 'a' => sub { my $a_text = $_->text; $a_text =~ s/>/\&gt;/g; $_->set_text($a_text); } } ); $xTwig->safe_parse($xml) or die "Failure to parse XML : $@"; print $xTwig->sprint(); Output: <?xml version="1.0"?> <document><a>&amp;gt;0</a></document>