Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 6135
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: william.geldhof [...] nss.be
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 1.54
  • 1.56
  • 1.58
Fixed in: (no value)



Subject: parse_string
system : solaris 9 (sparc) perl : 5.8.0 Problem: start : i have an xml file without special characters. i parse the file into my program, i add a special character (ascii > 255) in CDATA and i save the data back into the file. Parsing the same file again,i get [error] :2: parser error : Input is not proper UTF-8, indicate encoding ! TNAAM lang_id="11" lang11="Produit" scroll="Y" type="text" itm="6"><![CDATA[d in the CDATA there's only an é after the dddd and if i dump the file on screen there's only an encoded character on place of é.
Subject: Re: [cpan #6135] parse_string
From: Christian Glahn <christian.glahn [...] uibk.ac.at>
To: bug-XML-LibXML [...] rt.cpan.org
Date: Mon, 26 Apr 2004 22:50:43 +0200
RT-Send-Cc:
First of all, there is no ASCII > 255 (ASCII is just 7 bit). usually XML::LibXML will set the correct encoding to the string. This is pretty well tested and prooved to work correctly. Perl 5.8.x is aware of encodings and set the internal UTF8 flag correctly. However, XML::LibXML has a nice feature called "Magic Encoding", which appears to cause you headaches. If Perl's UTF8 flag is not set to the characters passed to a XML::LibXML function, the module assumes the string is in the same encoding as the document is encoded with (which is confirmed as the correct behaviour by the community since 2001). In your case the document appears to have no encoding set, thus XML::LibXML assumes (correctly) that your encoding is UTF8. In your case it is not, since in UTF8 there is no single byte character > 255. To find out what went wrong, you should read the encodings section in the perlxml FAQ at http://perl-xml.sourceforge.net/faq/.