Skip Menu |

This queue is for tickets about the XML-Simple CPAN distribution.

Report information
The Basics
Id: 108956
Status: rejected
Priority: 0/
Queue: XML-Simple

People
Owner: grantm [...] cpan.org
Requestors: PJNEWMAN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.18
Fixed in: (no value)



Subject: XML::Simple doesn't encode unprintable characters
XML::Simple doesn't encode unprintable characters and therefore generates invalid XML: XML::Simple isn't escaping the characters, hence the invalid XML being generated. See the example code below: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; $conf->{baz}[0] = "foo\x07bar"; print XMLout($conf, keyattr => ['']); ./testxml | xmllint --noout - -:2: parser error : PCDATA invalid Char value 7 <baz>foobar</baz> ^
My workaround is currently this, but it's not particularly ideal: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; my $string = "foo\x07bar"; $string =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("\\x%02x",ord($1));/xeg; $conf->{baz}[0] = $string; print XMLout($conf, keyattr => ['']); On Sun Nov 15 10:38:43 2015, PJNEWMAN wrote: Show quoted text
> XML::Simple doesn't encode unprintable characters and therefore > generates invalid XML: > > XML::Simple isn't escaping the characters, hence the invalid XML being > generated. See the example code below: > > #!/usr/bin/perl -w > use strict; > use XML::Simple; > > my $conf; > $conf->{baz}[0] = "foo\x07bar"; > print XMLout($conf, keyattr => ['']); > > > ./testxml | xmllint --noout - > -:2: parser error : PCDATA invalid Char value 7 > <baz>foobar</baz> > ^
Of course I actually meant the following to correctly escape to XML: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; my $string = "foo\x07bar"; $string =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("&#x%02x;",ord($1));/eg; $conf->{baz}[0] = $string; print XMLout($conf, keyattr => ['']); On Sun Nov 15 11:06:14 2015, PJNEWMAN wrote: Show quoted text
> My workaround is currently this, but it's not particularly ideal: > #!/usr/bin/perl -w > use strict; > use XML::Simple; > > my $conf; > my $string = "foo\x07bar"; > $string =~ s/([\0-\x08\x0b\x0c\x0e- > \x1f\x7f])/sprintf("\\x%02x",ord($1));/xeg; > $conf->{baz}[0] = $string; > print XMLout($conf, keyattr => ['']); > > On Sun Nov 15 10:38:43 2015, PJNEWMAN wrote:
> > XML::Simple doesn't encode unprintable characters and therefore > > generates invalid XML: > > > > XML::Simple isn't escaping the characters, hence the invalid XML > > being > > generated. See the example code below: > > > > #!/usr/bin/perl -w > > use strict; > > use XML::Simple; > > > > my $conf; > > $conf->{baz}[0] = "foo\x07bar"; > > print XMLout($conf, keyattr => ['']); > > > > > > ./testxml | xmllint --noout - > > -:2: parser error : PCDATA invalid Char value 7 > > <baz>foobar</baz> > > ^
Hi Peter The problem with the characters you're encountering is that they are not valid characters in an XML document regardless of whether they are represented as simple bytes or as numeric character entities. The relevant section of the spec is: http://www.w3.org/TR/REC-xml/#charsets and it defines a character as: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] You'll notice that this excludes characters in the ranges x00-x08 and x0E-x1F. I think that XML 1.1 does relax this restriction but that won't help unless you're passing the XML to a 1.1 compliant parser (you'd probably also need to add an XML declaration with version='1.1'). If you are feeding the resulting XML to a parser that accepts &#x07; as a valid character, then you can subclass XML::Simple to implement your escaping: ========== package XML::SimpleCustomEscapes; use parent 'XML::Simple'; sub escape_value { my $self = shift; my $data = $self->SUPER::escape_value(shift); $data =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("&#x%02x;",ord($1));/eg; return $data; } 1; ========== Regards Grant