Skip Menu |

This queue is for tickets about the XML-Simple CPAN distribution.

Report information
The Basics
Id: 84519
Status: resolved
Priority: 0/
Queue: XML-Simple

People
Owner: grantm [...] cpan.org
Requestors: issh88 [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.15
Fixed in: (no value)



Subject: Slow in parsing huge 5MB XML
Hi, We are using XML::Simple to convert XML string into a Perl data structure. The size of XML string varies from few lines to MBs. While testing, we have noticed that to convert a XML string of size 5MB into a Perl data structure, XML::Simple - version 2.15 took few seconds whereas XML::Simple - version 2.20 took 13-18 minutes causing the execution time of our test to shoot up. Another interesting observation is that, XML::Simple (2.20) doesn't take too much time if we specify the file name (which has XML content) instead of specifying the XML content as a string. To reproduce this issue, the following simple script can be used, [isshwar@qalnb06 profile]$ cat xmlcheck.pl use XML::Simple; use DateTime; my $xml= new XML::Simple; open (FH, '<xmlfile'); my @arr = <FH>; print DateTime->now . "\n"; my $ref = $xml->XMLin("@arr"); print DateTime->now . "\n"; [isshwar@cyclrtp31 ~]$ I've attached the XML file 'xmlfile' used. Results obtained: Perl 5.14 takes 18+ minutes approx [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl 2013-04-09T05:09:56 2013-04-09T05:28:21 [isshwar@qalnb06 profile]$ With Perl 5.8.8, it just takes few sec. /usr/software/perl-implementation/perl-5.14.0/bin/perl [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.8.8 xmlcheck.pl 2013-04-08T13:40:59 2013-04-08T13:41:05 [isshwar@qalnb06 profile]$ The FAQ said, If you find that XML::Simple is very slow reading XML, the most likely reason is that you have XML::SAX installed but no additional SAX parser module. The XML::SAX distribution includes an XML parser written entirely in Perl. This is very portable but not very fast. For better performance install either XML::SAX::Expat or XML::LibXML. But we have both the modules installed. [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use XML::SAX::Expat' [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use XML::LibXML' [isshwar@qalnb06 ~]$ I also did profiling and found that, XML::Simple (version 2.15) uses XML::Parser::Expat::ParseString (xsub) which took inclusive time of 13 sec XML::Simple (version 2.20) uses XML::LibXML::_parse_sax_string (xsub) which took inclusive time of 1110 sec The version of XML::LibXML which we are using is "2.0004". This is bit older, I tried to get the latest version of XML::LibXML, in my local workspace using 'cpanm' and even with that, this issue is seen. When I switched preferred parser, it was able to parse the XML really very fast. setenv XML_SIMPLE_PREFERRED_PARSER XML::Parser [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl 2013-04-09T05:40:29 2013-04-09T05:40:31 [isshwar@qalnb06 profile]$ Since the documentation recommends using SAX parser and we have been using SAX parser currently with XML::Simple-2.15, the slowness is not letting us move to the latest version XML::Simple-2.20 and still use SAX parser. Please help us in resolving this issue. Kindly let me know, if the usage has gone wrong anywhere or you need any further information. We should thank you for providing such a useful module that it helped us to handle XML->Perl data structure conversion so easily these many years. ------------------------------------------------------------- Other config details: [isshwar@qalnb06 ~]$ uname -a Linux qalnb06.eng.btc.netapp.in 2.6.18-164.10.1.el5ntap1 #1 SMP Mon May 3 17:50:00 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux [isshwar@qalnb06 ~]$ [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -v This is perl 5, version 14, subversion 0 (v5.14.0) built for x86_64-linux-thread-multi Copyright 1987-2011, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. [isshwar@qalnb06 ~]$
Subject: xmlfile
Download xmlfile
application/octet-stream 5m

Message body not shown because it is not plain text.

From: issh88 [...] gmail.com
On Tue Apr 09 01:48:51 2013, Isshwarya wrote: Show quoted text
> Hi, > > We are using XML::Simple to convert XML string into a Perl data > structure. > The size of XML string varies from few lines to MBs. > > While testing, we have noticed that to convert a XML string of > size 5MB into a Perl data structure, > > XML::Simple - version 2.15 took few seconds whereas > XML::Simple - version 2.20 took 13-18 minutes causing the > execution time of our test to shoot up. > > Another interesting observation is that, XML::Simple (2.20) doesn't > take too much time if we specify the file name (which has XML content) > instead of specifying the XML content as a string. > > To reproduce this issue, the following simple script can be used, > > [isshwar@qalnb06 profile]$ cat xmlcheck.pl > use XML::Simple; > use DateTime; > my $xml= new XML::Simple; > open (FH, '<xmlfile'); > my @arr = <FH>; > print DateTime->now . "\n"; > my $ref = $xml->XMLin("@arr"); > print DateTime->now . "\n"; > [isshwar@cyclrtp31 ~]$ > > I've attached the XML file 'xmlfile' used. > > Results obtained: > > Perl 5.14 takes 18+ minutes approx > [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl > 2013-04-09T05:09:56 > 2013-04-09T05:28:21 > [isshwar@qalnb06 profile]$ > > With Perl 5.8.8, it just takes few sec. > /usr/software/perl-implementation/perl-5.14.0/bin/perl > [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.8.8 xmlcheck.pl > 2013-04-08T13:40:59 > 2013-04-08T13:41:05 > [isshwar@qalnb06 profile]$ > > The FAQ said, > > If you find that XML::Simple is very slow reading XML, the most likely > reason is that you have XML::SAX installed but no additional SAX > parser module. The XML::SAX distribution includes an XML parser > written entirely in Perl. This is very portable but not very fast. > For better performance install either XML::SAX::Expat or > XML::LibXML. > > But we have both the modules installed. > > [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use > XML::SAX::Expat' > [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use XML::LibXML' > [isshwar@qalnb06 ~]$ > > I also did profiling and found that, > > XML::Simple (version 2.15) uses > XML::Parser::Expat::ParseString (xsub) which took inclusive time > of 13 sec > > XML::Simple (version 2.20) uses > XML::LibXML::_parse_sax_string (xsub) which took inclusive time > of 1110 sec > > The version of XML::LibXML which we are using is "2.0004". > This is bit older, I tried to get the latest version of XML::LibXML, > in my local workspace using 'cpanm' and even with that, this > issue is seen. > > When I switched preferred parser, it was able to parse the XML really > very fast. > > setenv XML_SIMPLE_PREFERRED_PARSER XML::Parser > > [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl > 2013-04-09T05:40:29 > 2013-04-09T05:40:31 > [isshwar@qalnb06 profile]$ > > Since the documentation recommends using SAX parser and we have been > using SAX parser currently with XML::Simple-2.15, the slowness is > not letting us move to the latest version XML::Simple-2.20 and > still use SAX parser. > > Please help us in resolving this issue. Kindly let me know, if the > usage has gone wrong anywhere or you need any further information. > > We should thank you for providing such a useful module that it > helped us to handle XML->Perl data structure conversion so easily > these many years. > ------------------------------------------------------------- > Other config details: > > [isshwar@qalnb06 ~]$ uname -a > Linux qalnb06.eng.btc.netapp.in 2.6.18-164.10.1.el5ntap1 #1 SMP Mon > May 3 17:50:00 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux > [isshwar@qalnb06 ~]$ > > > [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -v > > This is perl 5, version 14, subversion 0 (v5.14.0) built for x86_64- > linux-thread-multi > > Copyright 1987-2011, Larry Wall > > Perl may be copied only under the terms of either the Artistic License > or the > GNU General Public License, which may be found in the Perl 5 source > kit. > > Complete documentation for Perl, including FAQ lists, should be found > on > this system using "man perl" or "perldoc perl". If you have access to > the > Internet, point your browser at http://www.perl.org/, the Perl Home > Page. > > [isshwar@qalnb06 ~]$
Sorry, I've wrongly selected 2.15 in 'Borken_in' field. It should have been 2.20. I tried to update the field again and I'm not able to.
This really does sound like your default parser is XML::SAX::PurePerl. Possibly the ParserDetails.ini file was not updated when you installed the other parser modules. Running this one-liner from a command shell will tell what the default SAX parser is: perl -MXML::SAX -le "print ref(XML::SAX::ParserFactory->parser())" On my system, it prints out: XML::SAX::ExpatXS If the result is XML::SAX::PurePerl then you'll need to find the file called XML/SAX/ParserDetails.ini somewhere in you Perl lib search path. You can edit the file and add a section like this to the end of the file: [XML::SAX::ExpatXS] http://xml.org/sax/features/external-general-entities = 1 http://xml.org/sax/features/external-parameter-entities = 1 http://xmlns.perl.org/sax/recstring = 1 http://xmlns.perl.org/sax/locator = 1 http://xml.org/sax/features/xmlns-uris = 1 http://xmlns.perl.org/sax/ns-attributes = 1 http://xml.org/sax/features/namespaces = 1 http://xmlns.perl.org/sax/version-2.1 = 1 http://xmlns.perl.org/sax/xmlns-uris = 1 http://xmlns.perl.org/sax/join-character-data = 1 The last parser module listed will be the default. Regards Grant
Subject: Re: [rt.cpan.org #84519] Slow in parsing huge 5MB XML
Date: Wed, 10 Apr 2013 14:53:50 +0530
To: bug-XML-Simple [...] rt.cpan.org
From: Isshwarya M <issh88 [...] gmail.com>
Hi Grant, Thanks for the quick reply. I tried your commandline to obtain the parser. [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -MXML::SAX -le "print ref(XML::SAX::ParserFactory->parser())" XML::LibXML::SAX [isshwar@qalnb06 ~]$ [isshwar@qalnb06 ~]$ cat /usr/software/perl-implementation/perl-5.14.0/lib/site_perl/5.14.0/XML/SAX/ParserDetails.ini [XML::SAX::Expat] http://xml.org/sax/features/namespaces = 1 http://xml.org/sax/features/external-general-entities = 1 http://xml.org/sax/features/external-parameter-entities = 1 [XML::LibXML::SAX::Parser] http://xml.org/sax/features/namespaces = 1 [XML::LibXML::SAX] http://xml.org/sax/features/namespaces = 1 [isshwar@qalnb06 ~]$ Do you see any issues in using XML::LibXML::SAX? We don't have XML::SAX::ExpatXS installed in our system and I can get it installed though. On 10 April 2013 13:13, Grant McLean via RT <bug-XML-Simple@rt.cpan.org>wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=84519 > > > This really does sound like your default parser is XML::SAX::PurePerl. > Possibly the ParserDetails.ini file was not updated when you installed the > other parser modules. > > Running this one-liner from a command shell will tell what the default SAX > parser is: > > perl -MXML::SAX -le "print ref(XML::SAX::ParserFactory->parser())" > > On my system, it prints out: > > XML::SAX::ExpatXS > > If the result is XML::SAX::PurePerl then you'll need to find the file > called XML/SAX/ParserDetails.ini somewhere in you Perl lib search path. You > can edit the file and add a section like this to the end of the file: > > [XML::SAX::ExpatXS] > http://xml.org/sax/features/external-general-entities = 1 > http://xml.org/sax/features/external-parameter-entities = 1 > http://xmlns.perl.org/sax/recstring = 1 > http://xmlns.perl.org/sax/locator = 1 > http://xml.org/sax/features/xmlns-uris = 1 > http://xmlns.perl.org/sax/ns-attributes = 1 > http://xml.org/sax/features/namespaces = 1 > http://xmlns.perl.org/sax/version-2.1 = 1 > http://xmlns.perl.org/sax/xmlns-uris = 1 > http://xmlns.perl.org/sax/join-character-data = 1 > > > The last parser module listed will be the default. > > Regards > Grant > > >
On Wed Apr 10 05:24:05 2013, Isshwarya wrote: Show quoted text
> I tried your commandline to obtain the parser. > > [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -MXML::SAX -le > "print ref(XML::SAX::ParserFactory->parser())" > XML::LibXML::SAX > [isshwar@qalnb06 ~]$
OK - well that's definitely not XML::SAX::PurePerl :-) I would not expect XML::LibXML::SAX to be slow, but I have not used it extensively. Try editing the ParserDetails.ini to move the [XML::SAX::Expat] section to the bottom of the file. Grant
Subject: Re: [rt.cpan.org #84519] Slow in parsing huge 5MB XML
Date: Thu, 11 Apr 2013 21:26:42 +0530
To: bug-XML-Simple [...] rt.cpan.org
From: Isshwarya M <issh88 [...] gmail.com>
Thanks Grant. I moved XML::SAX::Expat to the bottom and that did the magic. It's a lot more better now. [isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl 2013-04-11T15:50:01 2013-04-11T15:50:07 I also checked the ParserDetails.ini picked up by our Perl5.8.8 installation and it's also using XML::SAX::Expat. Thanks much for your quick response. The issue is resolved now. Feel free to move the bug status to closed/fixed. Thanks again! On 10 April 2013 15:09, Grant McLean via RT <bug-XML-Simple@rt.cpan.org>wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=84519 > > > On Wed Apr 10 05:24:05 2013, Isshwarya wrote:
> > I tried your commandline to obtain the parser. > > > > [isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -MXML::SAX -le > > "print ref(XML::SAX::ParserFactory->parser())" > > XML::LibXML::SAX > > [isshwar@qalnb06 ~]$
> > OK - well that's definitely not XML::SAX::PurePerl :-) > > I would not expect XML::LibXML::SAX to be slow, but I have not used it > extensively. > > Try editing the ParserDetails.ini to move the [XML::SAX::Expat] section to > the bottom of the file. > > Grant >