Subject: | Slow in parsing huge 5MB XML |
Hi,
We are using XML::Simple to convert XML string into a Perl data structure.
The size of XML string varies from few lines to MBs.
While testing, we have noticed that to convert a XML string of
size 5MB into a Perl data structure,
XML::Simple - version 2.15 took few seconds whereas
XML::Simple - version 2.20 took 13-18 minutes causing the
execution time of our test to shoot up.
Another interesting observation is that, XML::Simple (2.20) doesn't
take too much time if we specify the file name (which has XML content)
instead of specifying the XML content as a string.
To reproduce this issue, the following simple script can be used,
[isshwar@qalnb06 profile]$ cat xmlcheck.pl
use XML::Simple;
use DateTime;
my $xml= new XML::Simple;
open (FH, '<xmlfile');
my @arr = <FH>;
print DateTime->now . "\n";
my $ref = $xml->XMLin("@arr");
print DateTime->now . "\n";
[isshwar@cyclrtp31 ~]$
I've attached the XML file 'xmlfile' used.
Results obtained:
Perl 5.14 takes 18+ minutes approx
[isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl
2013-04-09T05:09:56
2013-04-09T05:28:21
[isshwar@qalnb06 profile]$
With Perl 5.8.8, it just takes few sec.
/usr/software/perl-implementation/perl-5.14.0/bin/perl
[isshwar@qalnb06 profile]$ /usr/software/bin/perl5.8.8 xmlcheck.pl
2013-04-08T13:40:59
2013-04-08T13:41:05
[isshwar@qalnb06 profile]$
The FAQ said,
If you find that XML::Simple is very slow reading XML, the most likely reason is that you have XML::SAX installed but no additional SAX parser module. The XML::SAX distribution includes an XML parser written entirely in Perl. This is very portable but not very fast. For better performance install either XML::SAX::Expat or XML::LibXML.
But we have both the modules installed.
[isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use XML::SAX::Expat'
[isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -e 'use XML::LibXML'
[isshwar@qalnb06 ~]$
I also did profiling and found that,
XML::Simple (version 2.15) uses
XML::Parser::Expat::ParseString (xsub) which took inclusive time
of 13 sec
XML::Simple (version 2.20) uses
XML::LibXML::_parse_sax_string (xsub) which took inclusive time
of 1110 sec
The version of XML::LibXML which we are using is "2.0004".
This is bit older, I tried to get the latest version of XML::LibXML,
in my local workspace using 'cpanm' and even with that, this
issue is seen.
When I switched preferred parser, it was able to parse the XML really very fast.
setenv XML_SIMPLE_PREFERRED_PARSER XML::Parser
[isshwar@qalnb06 profile]$ /usr/software/bin/perl5.14.0 xmlcheck.pl
2013-04-09T05:40:29
2013-04-09T05:40:31
[isshwar@qalnb06 profile]$
Since the documentation recommends using SAX parser and we have been using SAX parser currently with XML::Simple-2.15, the slowness is not letting us move to the latest version XML::Simple-2.20 and still use SAX parser.
Please help us in resolving this issue. Kindly let me know, if the usage has gone wrong anywhere or you need any further information.
We should thank you for providing such a useful module that it
helped us to handle XML->Perl data structure conversion so easily
these many years.
-------------------------------------------------------------
Other config details:
[isshwar@qalnb06 ~]$ uname -a
Linux qalnb06.eng.btc.netapp.in 2.6.18-164.10.1.el5ntap1 #1 SMP Mon May 3 17:50:00 PDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[isshwar@qalnb06 ~]$
[isshwar@qalnb06 ~]$ /usr/software/bin/perl5.14.0 -v
This is perl 5, version 14, subversion 0 (v5.14.0) built for x86_64-linux-thread-multi
Copyright 1987-2011, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
[isshwar@qalnb06 ~]$
Subject: | xmlfile |
Message body not shown because it is not plain text.