Skip Menu |

This queue is for tickets about the Parse-MediaWikiDump CPAN distribution.

Report information
The Basics
Id: 50885
Status: resolved
Priority: 0/
Queue: Parse-MediaWikiDump

People
Owner: Nobody in particular
Requestors: triddle [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.93
Fixed in: (no value)



Subject: Processing speed of ::Pages has been drastically reduced
Since the new XML processing engine has been in place the processing speed of Parse::MediaWikiDump::Pages has been drastically reduced, somewhere on the order of 10 times slower (sorry about that). There are two available work arounds: use Parse::MediaWikiDump version 0.92 (available from http://backpan.perl.org/authors/id/T/TR/TRIDDLE/Parse-MediaWikiDump-0.92.tar.gz) or apply the patch attached to this ticket to Parse::MediaWikiDump version 0.97 to regain the lost speed. Using version 0.92 has the unfortunate side effect of bringing in a memory leak bug regression. If that is a problem for you patch version 0.97 by doing the following inside the Parse- MediaWikiDump-0.97 directory: Parse-MediaWikiDump-0.97> patch -p1 < Parse-MediaWikiDump-0.97_speedfix.patch Version 0.98 when released will have full processing speed again.
Subject: Parse-MediaWikiDump-0.97_speedfix.patch
diff -u -r Parse-MediaWikiDump-0.97/lib/Parse/MediaWikiDump/XML.pm Parse-MediaWikiDump-0.97_speedfix/lib/Parse/MediaWikiDump/XML.pm --- Parse-MediaWikiDump-0.97/lib/Parse/MediaWikiDump/XML.pm 2009-10-23 09:11:25.000000000 -0700 +++ Parse-MediaWikiDump-0.97_speedfix/lib/Parse/MediaWikiDump/XML.pm 2009-10-27 09:20:18.000000000 -0700 @@ -71,7 +71,7 @@ $self->{root} = $root; $self->{element_stack} = []; $self->{accum} = $accum; - $self->{char_buf} = ''; + $self->{char_buf} = []; $self->{node_stack} = [ $root ]; return $self; @@ -156,7 +156,7 @@ sub handle_char_event { my ($self, $expat, $chars) = @_; - $self->{char_buf} .= $chars; + push(@{$self->{char_buf}}, $chars); } sub flush_chars { @@ -170,9 +170,9 @@ $cur_element = []; } - defined $handler && &$handler($self, $self->{accum}, $self->{char_buf}, @$cur_element); + defined $handler && &$handler($self, $self->{accum}, join('', @{$self->{char_buf}}), @$cur_element); - $self->{char_buf} = ''; + $self->{char_buf} = []; return undef; } @@ -395,4 +395,4 @@ $a->{$store_as} = $chars; } -1; \ No newline at end of file +1;
Fixed in 0.98
Fixed in 0.98