Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 33810
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: MARKOV [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: character-set used by parser
The character-set used when the file is opened by the parser is not documented in the man-page, and cannot be specified. UTF-8 would be a nice default, but probably Latin-1 is used. Failures are reported for UTF-16 encoded (Windows) files. Could you add some statement to XML::LibXML::Parser?
Dne út 04.bře.2008 16:47:08, MARKOV napsal(a): Show quoted text
> The character-set used when the file is opened by the parser is not > documented in the man-page, and cannot be specified. UTF-8 would be > a nice default, but probably Latin-1 is used. Failures are reported
for Show quoted text
> UTF-16 encoded (Windows) files. > > Could you add some statement to XML::LibXML::Parser?
The parser is XML 1.0 conformant. If your file does not contain a <?xml version="1.0" encoding="...." ?> declaration, then UTF-8 or UTF-16 is supposed (and BOM, if found, is taken into account). Note that your filehandle should not have any Perl I/O layers on it, that is, you should do binmode $fh; if not sure. If you still see errors, then please attach a sample input file and copy-paste the output you got. Also indicate the versions of XML::LibXML and libxml2 you have installed. -- Petr
Subject: Re: [rt.cpan.org #33810] character-set used by parser
Date: Tue, 4 Mar 2008 23:44:47 +0100
To: Petr Pajas via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Petr Pajas via RT (bug-XML-LibXML@rt.cpan.org) [080304 22:14]: Show quoted text
> > Dne út 04.bře.2008 16:47:08, MARKOV napsal(a):
> > UTF-16 encoded (Windows) files. > > Could you add some statement to XML::LibXML::Parser?
> > The parser is XML 1.0 conformant. If your file does not contain a <?xml > version="1.0" encoding="...." ?> declaration, then UTF-8 or UTF-16 is > supposed (and BOM, if found, is taken into account). Note that your > filehandle should not have any Perl I/O layers on it, that is, you > should do
Ok, this may be simply a bug of some kind. The files I got from my user comes your way by private mail. It seems not to be able to get the name attribute from node <xs:element name="CMAutoQuote"> via getAttribute. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #33810] character-set used by parser
Date: Wed, 5 Mar 2008 00:05:20 +0100
To: Petr Pajas via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Petr Pajas via RT (bug-XML-LibXML@rt.cpan.org) [080304 22:14]: Show quoted text
> If you still see errors, then please attach a sample input file and > copy-paste the output you got. Also indicate the versions of > XML::LibXML and libxml2 you have installed.
XML::LibXML 1.65 libxml2 2.6.30 I try to solve all user support questions for my modules myself (about 1 per day), but every once in a while I do not see the light and ask you. Sorry if they are not all justified: XML::LibXML is a complex library. Certainly regarding IO. -- MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Ok, I found the bug and fixed it in the SVN. The fix will be available in 1.67 when it's ready. -- Petr
On Wed Mar 05 09:17:42 2008, PAJAS wrote: Show quoted text
> Ok, I found the bug and fixed it in the SVN. The fix will be available > in 1.67 when it's ready.
The tests added for this ticket in t/03_doc.t fail with libxml >= 2.7.4 as they use UTF16LE encoded text but specify different encodings in the XML. This error is detected in libxml >= 2.7.4, causing the tests to fail: PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t Compiled against libxml2 version: 20705 Running libxml2 version: 20705 t/01basic.t ................... ok t/02parse.t ................... ok :2: parser error : Extra content at the end of the document t/03doc.t ..................... Dubious, test returned 2 (wstat 512, 0x200) Failed 8/166 subtests This has been reported in Debian bug 546240 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=546240). The attached patch encodes the text in the same encoding as that that the XML is labelled, resolving the problem.
This patch addresses the following build failure with libxml2 2.7.4 onwards. (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=546240) PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/01basic.....................+ /usr/lib/rpm/brp-python-bytecompile + umask 022 + cd /builddir/build/BUILD + cd XML-LibXML-1.69 + /usr/bin/make test Compiled against libxml2 version: 20705 Running libxml2 version: 20705 ok t/02parse.....................ok t/03doc.......................:2: parser error : Extra content at the end of the document dubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 159-166 Failed 8/166 tests, 95.18% okay The test is supplying UTF-16LE data marked as UTF-16BE, and the new libxml2 rightly complains. The patch makes the text encoding line up with the XML. --- XML-LibXML-1.69/t/03doc.t 2008-11-04 12:36:31.000000000 +0000 +++ XML-LibXML-1.69/t/03doc.t 2009-09-29 11:09:30.000000000 +0100 @@ -473,7 +473,7 @@ for my $enc (qw(UTF-16 UTF-16LE UTF-16BE)) { print "------------------\n"; print $enc,"\n"; - my $xml = Encode::encode('UTF-16LE',qq{<?xml version="1.0" encoding="$enc"?> + my $xml = Encode::encode($enc,qq{<?xml version="1.0" encoding="$enc"?> <test foo="bar"/> }); my $dom = XML::LibXML->new->parse_string($xml);
Hi, I've already found out and applied the same patch on my local copy:-) Now it is also committed to the SVN. Thanks for the report! -- Petr Dne út 29.zář.2009 07:10:11, paul@city-fan.org napsal(a): Show quoted text
> On Wed Mar 05 09:17:42 2008, PAJAS wrote:
> > Ok, I found the bug and fixed it in the SVN. The fix will be available > > in 1.67 when it's ready.
> > The tests added for this ticket in t/03_doc.t fail with libxml >= 2.7.4 > as they use UTF16LE encoded text but specify different encodings in the > XML. This error is detected in libxml >= 2.7.4, causing the tests to fail: > > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > Compiled against libxml2 version: 20705 > Running libxml2 version: 20705 > t/01basic.t ................... ok > t/02parse.t ................... ok > :2: parser error : Extra content at the end of the document > t/03doc.t ..................... > Dubious, test returned 2 (wstat 512, 0x200) > Failed 8/166 subtests > > This has been reported in Debian bug 546240 > (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=546240). The attached > patch encodes the text in the same encoding as that that the XML is > labelled, resolving the problem.