Subject: | Processing speed of ::Pages has been drastically reduced |
Since the new XML processing engine has been in place the processing speed of
Parse::MediaWikiDump::Pages has been drastically reduced, somewhere on the order of 10
times slower (sorry about that). There are two available work arounds: use
Parse::MediaWikiDump version 0.92 (available from
http://backpan.perl.org/authors/id/T/TR/TRIDDLE/Parse-MediaWikiDump-0.92.tar.gz) or
apply the patch attached to this ticket to Parse::MediaWikiDump version 0.97 to regain the lost
speed.
Using version 0.92 has the unfortunate side effect of bringing in a memory leak bug regression.
If that is a problem for you patch version 0.97 by doing the following inside the Parse-
MediaWikiDump-0.97 directory:
Parse-MediaWikiDump-0.97> patch -p1 < Parse-MediaWikiDump-0.97_speedfix.patch
Version 0.98 when released will have full processing speed again.
Subject: | Parse-MediaWikiDump-0.97_speedfix.patch |
diff -u -r Parse-MediaWikiDump-0.97/lib/Parse/MediaWikiDump/XML.pm Parse-MediaWikiDump-0.97_speedfix/lib/Parse/MediaWikiDump/XML.pm
--- Parse-MediaWikiDump-0.97/lib/Parse/MediaWikiDump/XML.pm 2009-10-23 09:11:25.000000000 -0700
+++ Parse-MediaWikiDump-0.97_speedfix/lib/Parse/MediaWikiDump/XML.pm 2009-10-27 09:20:18.000000000 -0700
@@ -71,7 +71,7 @@
$self->{root} = $root;
$self->{element_stack} = [];
$self->{accum} = $accum;
- $self->{char_buf} = '';
+ $self->{char_buf} = [];
$self->{node_stack} = [ $root ];
return $self;
@@ -156,7 +156,7 @@
sub handle_char_event {
my ($self, $expat, $chars) = @_;
- $self->{char_buf} .= $chars;
+ push(@{$self->{char_buf}}, $chars);
}
sub flush_chars {
@@ -170,9 +170,9 @@
$cur_element = [];
}
- defined $handler && &$handler($self, $self->{accum}, $self->{char_buf}, @$cur_element);
+ defined $handler && &$handler($self, $self->{accum}, join('', @{$self->{char_buf}}), @$cur_element);
- $self->{char_buf} = '';
+ $self->{char_buf} = [];
return undef;
}
@@ -395,4 +395,4 @@
$a->{$store_as} = $chars;
}
-1;
\ No newline at end of file
+1;