Skip Menu |

This queue is for tickets about the XML-DOM CPAN distribution.

Report information
The Basics
Id: 27793
Status: open
Priority: 0/
Queue: XML-DOM

People
Owner: Nobody in particular
Requestors: n [...] shaplov.ru
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 1.43
  • 1.44
Fixed in: (no value)



Subject: Error with siybols from iso-8859-1 in utf-8 xml file
I found the following problem: If we try to load and then write back to disk the following xml-file <root> <p>«English Text»</p> </root> with following script: use strict; use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ('test.xml'); $doc->printToFile ('result.xml'); We will get xml file in wrong encoding. Symbols « and » will be encoded as if they are in iso-8859-1 encoding, although, if there are some other non iso8859-1 symbols in other tags at other lines (russian letters for example) they would be encoded correctly. Moreover if we add at least one Russian letter in English Text ( <p>«English Text» А - Я </p> ), symbols « and » will be encoded correctly... There is a way to workaround this problem in XML::DOM 1.34: use strict; use XML::DOM; use Encode; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ('test.xml'); my $encoding=$doc->getXMLDecl()->getEncoding(); my $data = decode("utf8",$doc->toString); open DST,">:encoding($encoding)",'result2.xml'; print DST $data; close DST; But it does not work in 1.44. In 1.44 it will also give wrong result. It would be good if it works correctly without workarounds and with all versions... :-/ --------------------- Information about my system: $ perl -v This is perl, v5.8.8 built for i486-linux-gnu-thread-multi $ perl -e'use XML::DOM; print $XML::DOM::VERSION,"\n";' 1.43 $ echo $LANG ru_RU.KOI8-R $ cat /etc/issue Debian GNU/Linux lenny/sid \n \l
Show quoted text
> <root> > <p>«English Text»</p> > </root>
Sorry, I've missed one line while copy-past'ing File is: <?xml version="1.0" encoding="UTF-8"?> <root> <p>«English Text»</p> </root>