Subject: | Slow performance from XML::SAX (always uses XML::SAX::PurePerl) |
Hiya,
I found that TVDB parsing was incredibly slow on some operations, and
today (for some reason) the updates_week.zip was hung fetching for a
few hours. Debugging and poking around I found that the XML::SAX parser
was using PurePerl, despite my having installed the other alternatives.
Manually trying to create a parser alone in perl, with the
XML::SAX::ParserFactory gave me a non-PurePerl parser, so began
investigating further down.
The root cause is that the .ini file parser in XML::SAX cannot read its
configuration file - it slurps the entire file into a single line. It
doesn't expect $/ to be anything but the default. Generally it's not a
good idea to modify $/ except for your own use, so exploration back up
the call tree led me to TVDB itself - as part of its reading of files
(_downloadZip) it uses:
my $obj = new IO::Uncompress::Unzip \$zip, MultiStream => 1,
Transparent => 1
or die "IO::Uncompress::Unzip failed: $url\n";
local $/ = undef;
my $xml = <$obj>;
...
return $self->{xml}->XMLin($xml);
where the call to XMLin() is in XML::Simple, which in turn calls down
to XML::SAX to obtain the correct type of parser.
The fix is to wrap the local $/ in braces to prevent the change being
left until the end of the function:
my $xml;
{
local $/ = undef;
$xml = <$obj>;
}
which makes the slurping of the file only take place there, and allows
the XML::SAX parser to work as it was intended - and significantly
faster.