Skip Menu |

This queue is for tickets about the XML-Writer CPAN distribution.

Report information
The Basics
Id: 37032
Status: rejected
Priority: 0/
Queue: XML-Writer

People
Owner: Nobody in particular
Requestors: canipe_chris [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.600
Fixed in: (no value)



Subject: Numbered character references are botched
The hex and decimal formats of "&#x...;" and "&#...;" become "&amp;#x...;" and "&amp;#...;" when written. I realize the raw method exists, but these are valid in XML and should not require the UNSAFE setting. The following code yields "<test>r&amp;#xe9;sum&amp;#233;</test>": use XML::Writer; my $string = 'r&#xe9;sum&#233;'; my $XML = XML::Writer->new(); $XML->dataElement('test', $string); $XML->end(); I'm running Perl 5.8.8 on SunOS 5.8. Thanks.
This is as intended - see examples/double-escaping-example.pl for a quick sample of why it has to behave in this way. Try: my $string = "r\x{e9}sum\x{e9}"; my $XML = XML::Writer->new(ENCODING => 'us-ascii'); for the output you want.
Subject: Re: [rt.cpan.org #37032] Numbered character references are botched
Date: Tue, 24 Jun 2008 13:47:40 -0700 (PDT)
To: bug-XML-Writer [...] rt.cpan.org
From: chris canipe <canipe_chris [...] yahoo.com>
Joseph, There's no telling where data may be coming from these days. As a result, should the module not use a lookahead to determine which ampersands to escape? For example, s/&(?!#x?\d+;)/&amp;/ would escape "Father & Son," but not "Father &#x26; Son" or "Father &#38; Son." This could be expanded to check for amp, lt, gt, apos, and quot as well. Thanks, Chris --- On Mon, 6/23/08, Joseph Walton via RT <bug-XML-Writer@rt.cpan.org> wrote: From: Joseph Walton via RT <bug-XML-Writer@rt.cpan.org> Subject: [rt.cpan.org #37032] Numbered character references are botched To: canipe_chris@yahoo.com Date: Monday, June 23, 2008, 5:05 PM <URL: http://rt.cpan.org/Ticket/Display.html?id=37032 > This is as intended - see examples/double-escaping-example.pl for a quick sample of why it has to behave in this way. Try: my $string = "r\x{e9}sum\x{e9}"; my $XML = XML::Writer->new(ENCODING => 'us-ascii'); for the output you want.
Please see: http://rt.cpan.org/Ticket/Display.html?id=17862#txn-236936 I'm sure you can think of other cases where silently modifying data may cause problems - round-tripping '& &amp; &' to '& & &', for example.