Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 4263
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: bzm [...] 2bz.de
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.56
Fixed in: (no value)



Subject: using XML parser inside a callback did not work
Hi, whenever I try to use or create a parser inside a callback function like match_callback, open_callback, close_callback or read_callback I get everything from segfault's to 'Attempt to free unreferenced scalar'. Attached is a testcase the example/cb_example.pl with only one more line inside open_uri. Additional line: my $p = XML::LibXML->new->parse_string(q{<?xml version="1.0"?><xml></xml>}); I have tried it with XML-LibXML 1.56 XML-LibXML from CVS libxml2-2.5.[6,8,11] perl 5.8.1 -- Boris
use XML::LibXML; use IO::File; # first instanciate the parser my $parser = XML::LibXML->new(); # initialize the callbacks $parser->match_callback( \&match_uri ); $parser->read_callback( \&read_uri ); $parser->open_callback( \&open_uri ); $parser->close_callback( \&close_uri ); # include XIncludes on the fly $parser->expand_xinclude( 1 ); # parse the file "text.xml" in the current directory $dom = $parser->parse_file("test.xml"); print $dom->toString() , "\n"; # the callbacks follow # these callbacks are used for both the original parse AND the XInclude sub match_uri { my $uri = shift; return $uri !~ /:\/\// ? 1 : 0; # we handle only files } sub open_uri { my $uri = shift; my $handler = new IO::File; if ( not $handler->open( "<$uri" ) ){ $handler = 0; } my $p = XML::LibXML->new->parse_string(q{<?xml version="1.0"?><xml></xml>}); return $handler; } sub read_uri { my $handler = shift; my $length = shift; my $buffer = undef; if ( $handler ) { $handler->read( $buffer, $length ); } return $buffer; } sub close_uri { my $handler = shift; if ( $handler ) { $handler->close(); } return 1; }
RT-Send-CC: xml-libxml [...] axkit.org
[BORISZ - Sat Nov 1 11:04:24 2003]: Show quoted text
> Hi, > > whenever I try to use or create a parser inside a callback function > like match_callback, open_callback, close_callback or read_callback I > get everything from segfault's to 'Attempt to free unreferenced > scalar'.
the segfault disapears with 1.57. Show quoted text
> Attached is a testcase the example/cb_example.pl with only one more > line inside open_uri.
However, the "bug" itself still remains. The sollution is: don't call a parser from within a callback! The reason for this is: XML::LibXML will override the existing parser callbacks (which are globally stored in libxml2). The effect is, that the main parser will not get any data at all while the nested parser succeeds. I am not shure if adding multithreading code to XML::LibXML can solve this particular problem. christian
Subject: [patch] using XML parser inside a callback did not work
On Sun Mar 07 06:31:18 2004, PHISH wrote:
Show quoted text
> [BORISZ - Sat Nov 1 11:04:24 2003]:
>
> > Hi,
> >
> > whenever I try to use or create a parser inside a callback function
> > like match_callback, open_callback, close_callback or read_callback I
> > get everything from segfault's to 'Attempt to free unreferenced
> > scalar'.
>
> the segfault disapears with 1.57.
>
> > Attached is a testcase the example/cb_example.pl with only one more
> > line inside open_uri.
>
> However, the "bug" itself still remains.
>
> The sollution is: don't call a parser from within a callback!
>
> The reason for this is: XML::LibXML will override the existing parser
> callbacks (which are globally stored in libxml2). The effect is, that
> the main parser will not get any data at all while the nested parser
> succeeds.
>
> I am not shure if adding multithreading code to XML::LibXML can solve
> this particular problem.
>
> christian
>

I was able to get a nested parser instance working correctly by changing how init_callbacks and cleanup_callbacks worked. instead of always calling lib_init_callbacks and lib_cleanup_callbacks when starting and finishing a parsing process, a $_NB_NESTED_DEPTH global is used to track if parsing is being started within another parsing instance and the lib_init_callbacks/lib_cleanup_callbacks is only called for the outermost parser.

here is a git branch I setup that addresses this issue:
http://github.com/frett/perl-libxml/commits/NestedParsing

attached are 2 separate patches.
The first patch (libxml-01.patch) allows xml to be parsed correctly from within an input callback, but does not completely support sharing an InputCallback object between the outer and inner parsers .
The second patch (libxml-02.patch) includes patch 1 and additionally allows the InputCallback object to be shared between the outer and inner parsers, but changes how legacy local parser callbacks are handled and could potentially break user's code if the user happens to be using legacy local callbacks and subclasses XML::LibXML::InputCallback and overrides the init_callbacks method (I highly doubt anyone has done this).

the second patch also fixes the underlying issue behind RT#51086

-Daniel Frett
Subject: libxml-01.patch
diff --git a/LibXML.pm b/LibXML.pm index 864647f..b71f05d 100644 --- a/LibXML.pm +++ b/LibXML.pm @@ -2075,12 +2075,14 @@ sub CLONE_SKIP { 1 } #-------------------------------------------------------------------------# package XML::LibXML::InputCallback; -use vars qw($_CUR_CB @_GLOBAL_CALLBACKS @_CB_STACK); +use vars qw($_CUR_CB @_GLOBAL_CALLBACKS @_CB_STACK $_CB_NESTED_DEPTH @_CB_NESTED_STACK); BEGIN { $_CUR_CB = undef; @_GLOBAL_CALLBACKS = (); @_CB_STACK = (); + $_CB_NESTED_DEPTH = 0; + @_CB_NESTED_STACK = (); } sub CLONE_SKIP { @@ -2203,9 +2205,20 @@ sub unregister_callbacks { sub init_callbacks { my $self = shift; + #initialize the libxml2 callbacks unless this is a nested callback + $self->lib_init_callbacks() unless($_CB_NESTED_DEPTH); + + #store the callbacks for any outer executing parser instance + $_CB_NESTED_DEPTH++; + push @_CB_NESTED_STACK, [ + $_CUR_CB, + [@_CB_STACK], + [@_GLOBAL_CALLBACKS], + ]; + + #initialize the callback variables for the current parser $_CUR_CB = undef; @_CB_STACK = (); - @_GLOBAL_CALLBACKS = @{ $self->{_CALLBACKS} }; if ( defined $XML::LibXML::match_cb and @@ -2217,19 +2230,21 @@ sub init_callbacks { $XML::LibXML::read_cb, $XML::LibXML::close_cb]; } - - $self->lib_init_callbacks(); } # reset libxml2's callbacks sub cleanup_callbacks { my $self = shift; - - $_CUR_CB = undef; - @_GLOBAL_CALLBACKS = (); - @_CB_STACK = (); - $self->lib_cleanup_callbacks(); + #restore the callbacks for the outer parser instance + $_CB_NESTED_DEPTH--; + my $saved = pop @_CB_NESTED_STACK; + $_CUR_CB = $saved->[0]; + @_CB_STACK = (@{$saved->[1]}); + @_GLOBAL_CALLBACKS = (@{$saved->[2]}); + + #clean up the libxml2 callbacks unless there are still outer parsing instances + $self->lib_cleanup_callbacks() unless($_CB_NESTED_DEPTH); } $XML::LibXML::__loaded=1; diff --git a/t/28new_callbacks_multiple.t b/t/28new_callbacks_multiple.t index 02c24fc..a24b7e5 100644 --- a/t/28new_callbacks_multiple.t +++ b/t/28new_callbacks_multiple.t @@ -1,6 +1,6 @@ # $Id$ use Test; -BEGIN { plan tests => 50 } +BEGIN { plan tests => 55 } END { ok(0) unless $loaded } use XML::LibXML; use IO::File; @@ -15,7 +15,8 @@ ok(1); <x xmlns:xinclude="http://www.w3.org/2001/XInclude"> <xml>test <xinclude:include href="/example/test2.xml"/> -<xinclude:include href="/libxml/test2.xml"/></xml> +<xinclude:include href="/libxml/test2.xml"/> +<xinclude:include href="/xmldom/test2.xml"/></xml> </x> EOF @@ -28,6 +29,9 @@ EOF $icb->register_callbacks( [ \&match_hash, \&open_hash, \&read_hash, \&close_hash ] ); + $icb->register_callbacks( [ \&match_xml, \&open_xml, + \&read_xml, \&close_xml ] ); + my $parser = XML::LibXML->new(); $parser->expand_xinclude(1); @@ -35,7 +39,7 @@ EOF my $doc = $parser->parse_string($string); ok($doc); - ok($doc->string_value(), "\ntest\n..\nbar..\n"); + ok($doc->string_value(), "\ntest\n..\nbar..\nbarbar\n"); } { @@ -173,3 +177,40 @@ sub match_hash2 { return 1; } } + +# --------------------------------------------------------------------- # +# callback set 4 (perl xml reader) +# --------------------------------------------------------------------- # +sub match_xml { + my $uri = shift; + if ( $uri =~ /^\/xmldom\// ){ + ok(1); + return 1; + } +} + +sub open_xml { + my $uri = shift; + my $dom = XML::LibXML->new->parse_string(q{<?xml version="1.0"?><foo><tmp/>barbar</foo>}); + ok($dom); + + return $dom; +} + +sub read_xml { + my $dom = shift; + my $buflen = shift; + + my $tmp = $dom->documentElement->findnodes('tmp')->shift; + my $rv = $tmp ? $dom->toString : ""; + $tmp->unbindNode if($tmp); + + ok(1); + return $rv; +} + +sub close_xml { + my $dom = shift; + undef $dom; + ok(1); +}
Subject: libxml-02.patch
diff --git a/LibXML.pm b/LibXML.pm index 864647f..c552c24 100644 --- a/LibXML.pm +++ b/LibXML.pm @@ -724,22 +724,12 @@ sub _init_callbacks { $icb = $self->{XML_LIBXML_CALLBACK_STACK}; } - my $mcb = $self->match_callback(); - my $ocb = $self->open_callback(); - my $rcb = $self->read_callback(); - my $ccb = $self->close_callback(); - - if ( defined $mcb and defined $ocb and defined $rcb and defined $ccb ) { - $icb->register_callbacks( [$mcb, $ocb, $rcb, $ccb] ); - } - $icb->init_callbacks(); + $icb->init_callbacks($self); } sub _cleanup_callbacks { my $self = shift; $self->{XML_LIBXML_CALLBACK_STACK}->cleanup_callbacks(); - my $mcb = $self->match_callback(); - $self->{XML_LIBXML_CALLBACK_STACK}->unregister_callbacks( [$mcb] ); } sub __read { @@ -2075,12 +2065,14 @@ sub CLONE_SKIP { 1 } #-------------------------------------------------------------------------# package XML::LibXML::InputCallback; -use vars qw($_CUR_CB @_GLOBAL_CALLBACKS @_CB_STACK); +use vars qw($_CUR_CB @_GLOBAL_CALLBACKS @_CB_STACK $_CB_NESTED_DEPTH @_CB_NESTED_STACK); BEGIN { $_CUR_CB = undef; @_GLOBAL_CALLBACKS = (); @_CB_STACK = (); + $_CB_NESTED_DEPTH = 0; + @_CB_NESTED_STACK = (); } sub CLONE_SKIP { @@ -2202,12 +2194,36 @@ sub unregister_callbacks { # make libxml2 use the callbacks sub init_callbacks { my $self = shift; + my $parser = shift; + + #initialize the libxml2 callbacks unless this is a nested callback + $self->lib_init_callbacks() unless($_CB_NESTED_DEPTH); + + #store the callbacks for any outer executing parser instance + $_CB_NESTED_DEPTH++; + push @_CB_NESTED_STACK, [ + $_CUR_CB, + [@_CB_STACK], + [@_GLOBAL_CALLBACKS], + ]; + #initialize the callback variables for the current parser $_CUR_CB = undef; @_CB_STACK = (); - @_GLOBAL_CALLBACKS = @{ $self->{_CALLBACKS} }; + #attach parser specific callbacks + if($parser) { + my $mcb = $parser->match_callback(); + my $ocb = $parser->open_callback(); + my $rcb = $parser->read_callback(); + my $ccb = $parser->close_callback(); + if ( defined $mcb and defined $ocb and defined $rcb and defined $ccb ) { + unshift @_GLOBAL_CALLBACKS, [$mcb, $ocb, $rcb, $ccb]; + } + } + + #attach global callbacks if ( defined $XML::LibXML::match_cb and defined $XML::LibXML::open_cb and defined $XML::LibXML::read_cb and @@ -2217,19 +2233,21 @@ sub init_callbacks { $XML::LibXML::read_cb, $XML::LibXML::close_cb]; } - - $self->lib_init_callbacks(); } # reset libxml2's callbacks sub cleanup_callbacks { my $self = shift; - - $_CUR_CB = undef; - @_GLOBAL_CALLBACKS = (); - @_CB_STACK = (); - $self->lib_cleanup_callbacks(); + #restore the callbacks for the outer parser instance + $_CB_NESTED_DEPTH--; + my $saved = pop @_CB_NESTED_STACK; + $_CUR_CB = $saved->[0]; + @_CB_STACK = (@{$saved->[1]}); + @_GLOBAL_CALLBACKS = (@{$saved->[2]}); + + #clean up the libxml2 callbacks unless there are still outer parsing instances + $self->lib_cleanup_callbacks() unless($_CB_NESTED_DEPTH); } $XML::LibXML::__loaded=1; diff --git a/docs/libxml.dbk b/docs/libxml.dbk index ec6df23..f52f06e 100644 --- a/docs/libxml.dbk +++ b/docs/libxml.dbk @@ -5547,10 +5547,10 @@ $parser-&gt;parse_file( $some_xml_file );</programlisting> </varlistentry> <varlistentry> - <term>init_callbacks()</term> + <term>init_callbacks( $parser )</term> <listitem> - <para>Initializes the callback system before a parsing process.</para> + <para>Initializes the callback system for the provided parser before starting a parsing process.</para> </listitem> </varlistentry> diff --git a/t/28new_callbacks_multiple.t b/t/28new_callbacks_multiple.t index 02c24fc..dd119c5 100644 --- a/t/28new_callbacks_multiple.t +++ b/t/28new_callbacks_multiple.t @@ -1,6 +1,6 @@ # $Id$ use Test; -BEGIN { plan tests => 50 } +BEGIN { plan tests => 77 } END { ok(0) unless $loaded } use XML::LibXML; use IO::File; @@ -15,7 +15,8 @@ ok(1); <x xmlns:xinclude="http://www.w3.org/2001/XInclude"> <xml>test <xinclude:include href="/example/test2.xml"/> -<xinclude:include href="/libxml/test2.xml"/></xml> +<xinclude:include href="/libxml/test2.xml"/> +<xinclude:include href="/xmldom/test2.xml"/></xml> </x> EOF @@ -28,6 +29,9 @@ EOF $icb->register_callbacks( [ \&match_hash, \&open_hash, \&read_hash, \&close_hash ] ); + $icb->register_callbacks( [ \&match_xml, \&open_xml, + \&read_xml, \&close_xml ] ); + my $parser = XML::LibXML->new(); $parser->expand_xinclude(1); @@ -35,7 +39,7 @@ EOF my $doc = $parser->parse_string($string); ok($doc); - ok($doc->string_value(), "\ntest\n..\nbar..\n"); + ok($doc->string_value(), "\ntest\n..\nbar..\nbarbar\n"); } { @@ -73,6 +77,58 @@ EOF ok($doc->string_value(), "\ntest\n..\n\n \n \n"); } +{ + my $string = <<EOF; +<x xmlns:xinclude="http://www.w3.org/2001/XInclude"> +<xml>test +<xinclude:include href="/example/test2.xml"/> +<xinclude:include href="/xmldom/test2.xml"/></xml> +</x> +EOF + my $string2 = <<EOF; +<x xmlns:xinclude="http://www.w3.org/2001/XInclude"> +<tmp/><xml>foo..<xinclude:include href="/example/test2.xml"/>bar</xml> +</x> +EOF + + + my $icb = XML::LibXML::InputCallback->new(); + ok($icb); + + my $open_xml2 = sub { + my $uri = shift; + my $parser = XML::LibXML->new; + $parser->expand_xinclude(1); + $parser->input_callbacks($icb); + + my $dom = $parser->parse_string($string2); + ok($dom); + + return $dom; + }; + + $icb->register_callbacks( [ \&match_xml, $open_xml2, + \&read_xml, \&close_xml ] ); + + $icb->register_callbacks( [ \&match_hash2, \&open_hash, + \&read_hash, \&close_hash ] ); + + my $parser = XML::LibXML->new(); + $parser->expand_xinclude(1); + + $parser->match_callback( \&match_file ); + $parser->open_callback( \&open_file ); + $parser->read_callback( \&read_file ); + $parser->close_callback( \&close_file ); + + $parser->input_callbacks($icb); + + my $doc = $parser->parse_string($string); + + ok($doc); + ok($doc->string_value(), "\ntest\n..\n\nfoo..bar..bar\n\n"); +} + # --------------------------------------------------------------------- # # CALLBACKS @@ -173,3 +229,40 @@ sub match_hash2 { return 1; } } + +# --------------------------------------------------------------------- # +# callback set 4 (perl xml reader) +# --------------------------------------------------------------------- # +sub match_xml { + my $uri = shift; + if ( $uri =~ /^\/xmldom\// ){ + ok(1); + return 1; + } +} + +sub open_xml { + my $uri = shift; + my $dom = XML::LibXML->new->parse_string(q{<?xml version="1.0"?><foo><tmp/>barbar</foo>}); + ok($dom); + + return $dom; +} + +sub read_xml { + my $dom = shift; + my $buflen = shift; + + my $tmp = $dom->documentElement->findnodes('tmp')->shift; + my $rv = $tmp ? $dom->toString : ""; + $tmp->unbindNode if($tmp); + + ok(1); + return $rv; +} + +sub close_xml { + my $dom = shift; + undef $dom; + ok(1); +}
On Tue Dec 29 15:41:07 2009, dfrett wrote: Show quoted text
> On Sun Mar 07 06:31:18 2004, PHISH wrote:
> > [BORISZ - Sat Nov 1 11:04:24 2003]: > >
> > > Hi, > > > > > > whenever I try to use or create a parser inside a callback
> function
> > > like match_callback, open_callback, close_callback or
> read_callback I
> > > get everything from segfault's to 'Attempt to free unreferenced > > > scalar'.
> > > > the segfault disapears with 1.57. > >
> > > Attached is a testcase the example/cb_example.pl with only one
> more
> > > line inside open_uri.
> > > > However, the "bug" itself still remains. > > > > The sollution is: don't call a parser from within a callback! > > > > The reason for this is: XML::LibXML will override the existing
> parser
> > callbacks (which are globally stored in libxml2). The effect is,
> that
> > the main parser will not get any data at all while the nested parser > > succeeds. > > > > I am not shure if adding multithreading code to XML::LibXML can
> solve
> > this particular problem. > > > > christian > >
> > I was able to get a nested parser instance working correctly by > changing how > init_callbacks and cleanup_callbacks worked. instead of always calling > lib_init_callbacks and lib_cleanup_callbacks when starting and > finishing a > parsing process, a $_NB_NESTED_DEPTH global is used to track if > parsing is > being started within another parsing instance and the > lib_init_callbacks/lib_cleanup_callbacks is only called for the > outermost > parser. > > here is a git branch I setup that addresses this issue: > http://github.com/frett/perl-libxml/commits/NestedParsing > > attached are 2 separate patches. > The first patch (libxml-01.patch) allows xml to be parsed correctly > from within > an input callback, but does not completely support sharing an > InputCallback > object between the outer and inner parsers . > The second patch (libxml-02.patch) includes patch 1 and additionally > allows the > InputCallback object to be shared between the outer and inner parsers, > but > changes how legacy local parser callbacks are handled and could > potentially > break user's code if the user happens to be using legacy local > callbacks and > subclasses XML::LibXML::InputCallback and overrides the init_callbacks > method > (I highly doubt anyone has done this). > > the second patch also fixes the underlying issue behind RT#51086 >
Can you give new patches to the repository at: https://bitbucket.org/shlomif/perl-xml-libxml Regards, -- Shlomi Fish Show quoted text
> -Daniel Frett
From: daniel.frett [...] ccci.org
On Sun Jun 19 05:29:57 2011, SHLOMIF wrote: Show quoted text
> Can you give new patches to the repository at: > > https://bitbucket.org/shlomif/perl-xml-libxml > > Regards, > > -- Shlomi Fish
yeah, thanks for taking over maintenance of this module. -Daniel Frett
Hi all, I'm resolving this bug because the patches were applied. If there is more to do, please open a new report or reply to this bug to reopen it (with an explanation and a patch.).