Subject: | Newlines in attribute values |
Date: | Tue, 30 Oct 2012 22:37:29 +0100 |
To: | bug-XML-Twig [...] rt.cpan.org |
From: | Zsbán Ambrus <ambrus [...] math.bme.hu> |
Hello,
According to the specs, a newline character in an attribute value must
be escaped with an entity otherwise an xml reader will normalize it to
a space, but XML::Twig's writer does not seem to know about this.
Let me tell the story of the details.
I was trying to edit an XML files,actually project configuration files
of MS Visual Studio, with Twig. This XML had an attribute with an
escaped CRLF inside an attribute value, something like
"foo bar". This attribute was in an element I didn't change
in my editing. When I tried to use the modified XML, I got an error.
It turns out that XML::Twig wrote out the attribute with the CRLF
unescaped, and the XML reader in MS Visual Studio read it as a single
space.
After some inquiry, perlmonks told me that the behavior of the XML
reader is correct. It turns out that the XML 1.0 standard claims that
if a reader finds unescaped CR, LF, CRLF, or HT in an attribute value,
it must normalize it to a space. You can find a reference for this
behavior at "http://stackoverflow.com/questions/260436/preserving-attribute-whitespace-in-xslt".
It turns out that the reader part of XML::Twig behaves correctly: it
too reads an unescaped newline in an attribute as a space, but the
writer part fails to escape newlines. This means that when you read
an escaped newline from an attribute then write it out, the value
changes, so I believe this is a bug in XML::Twig.
Here's a simple example showing the bug.
$ perl -we 'use XML::Twig; my $ct= qq(<m><n p="q
r"/><s
t="u\nv"/></m>); my $tw = XML::Twig->new; $tw->parse($ct); $tw->flush;
print $/;'
<m><n p="q
r"/><s t="u v"/></m>
$
For this simple example, I'm using perl v5.16.1on amd64-linux,
XML::Twig v3.41, XML::Parser v2.41, Encode v2.44, all vanilla; with
libexpat 2.0.1-7+squeeze1 from the debian package.
Ambrus
----
Configuration:
perl: 5.016001
OS: linux - x86_64-linux
required
XML::Parser : 2.41
Can't exec "xmlwf": No such file or directory at t/zz_dump_config.t line 34.
Use of uninitialized value $xmlwf_v in pattern match (m//) at
t/zz_dump_config.t line 35.
Missing argument in sprintf at t/zz_dump_config.t line 114.
expat : <no version information found>
Strongly Recommended
Scalar::Util : 1.25 (for improved memory management)
Encode : 2.44 (for encoding conversions)
Modules providing additional features
XML::XPathEngine : 0.13 (to use XML::Twig::XPath)
XML::XPath : <not available> (to use XML::Twig::XPath
if Tree::XPathEngine not available)
LWP : 6.04 (for the parseurl method)
HTML::TreeBuilder : 5.02 (to use parse_html and
parsefile_html)
HTML::Entities::Numbered : <not available> (to allow parsing of
HTML containing named entities)
HTML::Tidy : 1.54 (to use parse_html and
parsefile_html with the use_tidy option)
HTML::Entities : 3.69 (for the html_encode filter)
Tie::IxHash : <not available> (for the keep_atts_order option)
Text::Wrap : 2009.0305 (to use the "wrapped"
option for pretty_print)
Modules used only by the auto tests
t/zz_dump_config.t .................. 1/1 Test :
1.25_02
Test::Pod : <not available>
XML::Simple : <not available>
XML::Handler::YAWriter : <not available>
XML::SAX::Writer : <not available>
XML::Filter::BufferText : <not available>
IO::Scalar : <not available>
IO::CaptureOutput : <not available>