Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 42618
Status: rejected
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: ulli [...] prager.at
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: charset problem
Date: Wed, 21 Jan 2009 21:45:19 +0100
To: <bug-XML-LibXML [...] rt.cpan.org>
From: "Ursula Prager-Ramsa" <ulli [...] prager.at>
Hello , XML::libXML : Version 1.69 Perl: v5.8.8 built for i386-linux Linux imdcl1 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 i686 i686 i386 GNU/Linux Libxml: 2.7.2 I have a parser problem. Using the following statements: $parser = XML::LibXML->new(); my $out ='<?xml version="1.0" encoding="utf-8"?>'."\n".'<root>'; $out .="<xxx>\xc3\x96</xxx>\n"; $out .="<art>$art</art>\n"; $out .=XMLout(\@daten_cc,AttrIndent=>1,NoAttr=>1,RootName=>'cc'); ... $out .='</root>'; my $tree = $parser->parse_string($out); print $tree->toString; at this point I get sometimes the correct xml-string : <?xml version="1.0" encoding="utf-8"?> <root><xxx>Ö</xxx> <art>liste</art> ... And sometimes : <?xml version="1.0" encoding="utf-8"?> <root><xxx>Ã-</xxx> <art>aendern</art> ... It seems that the encoding information is ignored and I have no idea what to do. This behavior ist not restricted to this version of XML::libXML. I have this problem since several month (with older versions). Anything I can do? Kind regards Ursula Prager-Ramsa
My guess is you simply mix up byte strings and character strings and the byte strings when concatenated with the character strings get upgraded by Perl as if they were iso-8859-1 which may cause real mess. Make sure that the string you pass to XML::LibXML is a byte string (UTF8 flag is off) and that all the strings you concatenate are byte strings too. You may test at every point e.g. using Devel::Peek or utf8::is_utf8. Otherwise please provide a self contained test case. -- Petr Dne st 21.led.2009 15:46:21, ulli@prager.at napsal(a): Show quoted text
> Hello , > > > > XML::libXML : Version 1.69 > > Perl: v5.8.8 built for i386-linux > > Linux imdcl1 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
2006 Show quoted text
> i686 i686 i386 GNU/Linux > > Libxml: 2.7.2 > > > > I have a parser problem. > > Using the following statements: > > > > $parser = XML::LibXML->new(); > > my $out ='<?xml version="1.0" encoding="utf-8"?>'."\n".'<root>'; > > $out .="<xxx>\xc3\x96</xxx>\n"; > > $out .="<art>$art</art>\n"; > > $out .=XMLout(\@daten_cc,AttrIndent=>1,NoAttr=>1,RootName=>'cc'); > > ... > > $out .='</root>'; > > > > my $tree = $parser->parse_string($out); > > print $tree->toString; > > > > at this point I get sometimes the correct xml-string : > > > > <?xml version="1.0" encoding="utf-8"?> > > <root><xxx>Ö</xxx> > > <art>liste</art> > > ... > > > > And sometimes : > > <?xml version="1.0" encoding="utf-8"?> > > <root><xxx>Ã-</xxx> > > <art>aendern</art> > > ... > > > > It seems that the encoding information is ignored and I have no idea > what to do. This behavior ist not restricted to this version of > XML::libXML. I have this problem since several month (with older > versions). > > > > Anything I can do? > > > > Kind regards > > > > Ursula Prager-Ramsa > > >
Subject: AW: [rt.cpan.org #42618] charset problem
Date: Sat, 24 Jan 2009 10:24:39 +0100
To: <bug-XML-LibXML [...] rt.cpan.org>
From: "Ursula Prager-Ramsa" <ulli [...] prager.at>
Thank you very much. This was the solution. Until now is was not aware about a difference between byte strings and character strings. ulli Show quoted text
-----Ursprüngliche Nachricht----- Von: Petr Pajas via RT [mailto:bug-XML-LibXML@rt.cpan.org] Gesendet: Freitag, 23. Januar 2009 21:30 An: Ursula Prager-Ramsa Betreff: [rt.cpan.org #42618] charset problem <URL: https://rt.cpan.org/Ticket/Display.html?id=42618 > My guess is you simply mix up byte strings and character strings and the byte strings when concatenated with the character strings get upgraded by Perl as if they were iso-8859-1 which may cause real mess. Make sure that the string you pass to XML::LibXML is a byte string (UTF8 flag is off) and that all the strings you concatenate are byte strings too. You may test at every point e.g. using Devel::Peek or utf8::is_utf8. Otherwise please provide a self contained test case. -- Petr Dne st 21.led.2009 15:46:21, ulli@prager.at napsal(a):
> Hello , > > > > XML::libXML : Version 1.69 > > Perl: v5.8.8 built for i386-linux > > Linux imdcl1 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
2006
> i686 i686 i386 GNU/Linux > > Libxml: 2.7.2 > > > > I have a parser problem. > > Using the following statements: > > > > $parser = XML::LibXML->new(); > > my $out ='<?xml version="1.0" encoding="utf-8"?>'."\n".'<root>'; > > $out .="<xxx>\xc3\x96</xxx>\n"; > > $out .="<art>$art</art>\n"; > > $out .=XMLout(\@daten_cc,AttrIndent=>1,NoAttr=>1,RootName=>'cc'); > > ... > > $out .='</root>'; > > > > my $tree = $parser->parse_string($out); > > print $tree->toString; > > > > at this point I get sometimes the correct xml-string : > > > > <?xml version="1.0" encoding="utf-8"?> > > <root><xxx>Ö</xxx> > > <art>liste</art> > > ... > > > > And sometimes : > > <?xml version="1.0" encoding="utf-8"?> > > <root><xxx>Ã-</xxx> > > <art>aendern</art> > > ... > > > > It seems that the encoding information is ignored and I have no idea > what to do. This behavior ist not restricted to this version of > XML::libXML. I have this problem since several month (with older > versions). > > > > Anything I can do? > > > > Kind regards > > > > Ursula Prager-Ramsa > > >
Dne so 24.led.2009 04:46:03, ulli@prager.at napsal(a): Show quoted text
> Thank you very much. This was the solution. Until now is was not aware > about a difference between byte strings and character strings. > ulli
Ok, I'm closing this bug. Thanks, -- Petr