Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 78448
Status: open
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: wosch [...] freebsd.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: chinese character force a SIGABRT in XS_XML__LibXML__parse_fh and kill perl
Date: Wed, 18 Jul 2012 15:05:46 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Wolfram Schneider <wosch [...] FreeBSD.org>
Hi, one of my CGI scripts crashed when the user submitted chinese characters. It turns out that a single chinese character crashes the parse_fh() function. The error happens on all perl version up to 5.16, any OS and any XML::LibXML version up to 2.0002. gdb perl GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Thu Nov 3 21:59:02 UTC 2011) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .... done (gdb) run test.pl test-macos.xml Starting program: /opt/local/bin/perl test.pl test-macos.xml Reading symbols for shared libraries +++.......................... done Reading symbols for shared libraries . done Reading symbols for shared libraries . done Reading symbols for shared libraries . done Reading symbols for shared libraries . done Reading symbols for shared libraries .... done Running with XML::LibXML version: 1.84 Program received signal SIGABRT, Aborted. 0x00007fff8857682a in __kill () (gdb) bt #0 0x00007fff8857682a in __kill () #1 0x00007fff90b3fb6c in __abort () #2 0x00007fff90b3c070 in __stack_chk_fail () #3 0x00000001004078bb in XS_XML__LibXML__parse_fh () #4 0x0000000100082abf in Perl_pp_entersub () #5 0x000000010007af06 in Perl_runops_standard () #6 0x000000010001c5b4 in perl_run () #7 0x0000000100000d7d in main () (gdb) attached is a simple perl script and 2 xml files for testing. On MacOS, it crashes with a single chinese character, on Linux it needs some more padding characters to get the same effect. -Wolfram -- Wolfram Schneider <wosch@FreeBSD.org> http://wolfram.schneider.org

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #78448] chinese character force a SIGABRT in XS_XML__LibXML__parse_fh and kill perl
Date: Wed, 18 Jul 2012 15:21:49 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Christian Glahn <christian.glahn [...] lo-f.at>
For clarification: Do parse_file() and parse_string() work? Christian On 18 Jul 2012, at 15:06, wosch@freebsd.org via RT wrote: Show quoted text
> Wed Jul 18 09:06:13 2012: Request 78448 was acted upon. > Transaction: Ticket created by wosch@freebsd.org > Queue: XML-LibXML > Subject: chinese character force a SIGABRT in XS_XML__LibXML__parse_fh and kill perl > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: wosch@freebsd.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=78448 > > > > Hi, > > one of my CGI scripts crashed when the user submitted chinese > characters. It turns out that a single chinese character crashes > the parse_fh() function. > > The error happens on all perl version up to 5.16, any OS and any > XML::LibXML version up to 2.0002. > > gdb perl > GNU gdb 6.3.50-20050815 (Apple version gdb-1708) (Thu Nov 3 21:59:02 UTC 2011) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-apple-darwin"...Reading symbols for > shared libraries .... done > > (gdb) run test.pl test-macos.xml > Starting program: /opt/local/bin/perl test.pl test-macos.xml > Reading symbols for shared libraries +++.......................... done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries . done > Reading symbols for shared libraries .... done > Running with XML::LibXML version: 1.84 > > Program received signal SIGABRT, Aborted. > 0x00007fff8857682a in __kill () > (gdb) bt > #0 0x00007fff8857682a in __kill () > #1 0x00007fff90b3fb6c in __abort () > #2 0x00007fff90b3c070 in __stack_chk_fail () > #3 0x00000001004078bb in XS_XML__LibXML__parse_fh () > #4 0x0000000100082abf in Perl_pp_entersub () > #5 0x000000010007af06 in Perl_runops_standard () > #6 0x000000010001c5b4 in perl_run () > #7 0x0000000100000d7d in main () > (gdb) > > > attached is a simple perl script and 2 xml files for testing. On > MacOS, it crashes with a single chinese character, on Linux it needs > some more padding characters to get the same effect. > > -Wolfram > > -- > Wolfram Schneider <wosch@FreeBSD.org> http://wolfram.schneider.org > > <test.pl><chinese> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > > <characters>主題</characters> > > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > > <characters>關鍵詞</characters> > > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > <blob>foooooooooooooooooooooooooooooooooooooooooooooooooooo</blob> > </chinese> > <chinese> > <characters>題</characters> > <blob>blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob blob</blob> > </chinese>
Subject: Re: [rt.cpan.org #78448] chinese character force a SIGABRT in XS_XML__LibXML__parse_fh and kill perl
Date: Wed, 18 Jul 2012 16:01:11 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Wolfram Schneider <wosch [...] FreeBSD.org>
On 18 July 2012 15:22, Christian Glahn via RT <bug-XML-LibXML@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=78448 > > > For clarification: Do parse_file() and parse_string() work?
yepp, both works fine. I guess the bug is due a wrong IO handle character set check. Please note that the file descriptor in the test.pl script has the layer ':utf8'. If you set it to "raw" it will not crash. -Wolfram Show quoted text
> > Christian
Hi Wolfram, On Wed Jul 18 10:01:37 2012, wosch@freebsd.org wrote: Show quoted text
> On 18 July 2012 15:22, Christian Glahn via RT > <bug-XML-LibXML@rt.cpan.org> wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=78448 > > > > > For clarification: Do parse_file() and parse_string() work?
> > yepp, both works fine. > > I guess the bug is due a wrong IO handle character set check. Please > note that the file descriptor in the test.pl script has the layer > ':utf8'. If you set it to "raw" it will not crash. >
In that case, please see: https://metacpan.org/module/XML::LibXML#ENCODINGS-SUPPORT-IN-XML::LIBXML Namely item #1, which reads: [QUOTE] Do NOT apply any encoding-related PerlIO layers (:utf8 or :encoding(...)) to file handles that are an input for the parses or an output for a serializer of (full) XML documents. This is because the conversion of the data to/from the internal character representation is provided by libxml2 itself which must be able to enforce the encoding specified by the <?xml version="1.0" encoding="..."?> declaration. Here is an example to follow: [/QUOTE] Why are yuo using the ":utf8" layer in the first place? Perhaps the crash should be fixed though somehow. Regards, -- Shlomi Fish