Skip Menu |

This queue is for tickets about the Parse-MediaWikiDump CPAN distribution.

Report information
The Basics
Id: 50491
Status: resolved
Priority: 0/
Queue: Parse-MediaWikiDump

People
Owner: Nobody in particular
Requestors: tomaz.solc [...] tablix.org
david.m.carter [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.94
Fixed in: (no value)



Subject: Does not support version 0.4 dump files
The latest dumps from Wikimedia (for example 20091009 enwiki) have incremented dump format version to 0.4. Parse::MediaWikiDump fails to open those files with the following error message: Only version 0.3 dump files are supported at /usr/local/share/perl/5.10.0/Parse/MediaWikiDump/Pages.pm line 73. export-0.4.xsd in MediaWiki SVN describes changes in 0.4 as: Version 0.4 adds per-revision delete flags, log exports, discussion threading data, a per-page redirect flag, and per-namespace capitalization.
Subject: Any chance of upgrading MediaWikiDump to version 0.4?
Date: Wed, 14 Oct 2009 15:58:44 +0100
To: bug-Parse-MediaWikiDump [...] rt.cpan.org
From: David Carter <david.m.carter [...] gmail.com>
Hello, I'm trying to run MediaWikiDump on the latest Japanese Wikipedia dump file, http://download.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2 which contains this in its header: <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.4/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.4/ http://www.mediawiki.org/xml/export-0.4.xsd" version="0.4" xml:lang="ja"> which makes version 0.94 of MediaWikiDump fall over here: % grep 0.3 /usr/lib/perl5/site_perl/5.8.8/Parse/MediaWikiDump/Pages.pm die "Only version 0.3 dump files are supported" unless $attrs->{version} eq '0.3'; Do you have any plans to extend your code to handle 0.4? If so I would much appreciate it... Thanks David % perl -v ... perl, v5.8.8 built for i586-linux-thread-multi ... % uname -a Linux abbot 2.6.22.19-0.1-bigsmp #1 SMP 2008-10-14 22:17:43 +0200 i686 i686 i386 GNU/Linux
Thanks for the note about the change in the Mediawiki dump file revision number. I do plan to augment the code to support version 0.4 dump files but I don't have an estimate as to when it'll be released. I'll update this ticket as I make progress so you'll stay informed. Tyler Riddle On Wed Oct 14 07:29:55 2009, AVIAN wrote: Show quoted text
> The latest dumps from Wikimedia (for example 20091009 enwiki) have > incremented dump format version to 0.4. > > Parse::MediaWikiDump fails to open those files with the following error > message: > > Only version 0.3 dump files are supported at > /usr/local/share/perl/5.10.0/Parse/MediaWikiDump/Pages.pm line 73. > > export-0.4.xsd in MediaWiki SVN describes changes in 0.4 as: > > Version 0.4 adds per-revision delete flags, log exports, > discussion threading data, a per-page redirect flag, and > per-namespace capitalization.
The XML schema for the 0.94 dump files has not been published at the URLs specified in the dump files so I opened a bug with MediaWiki to have them published. Until I get my hands on the schema and can augment the code to support the new data I've created a simple patch that allows 0.4 version dump files with out any other changes and I've attached it to this ticket. I have not fully tested whether or not the changes to the dump file cause weird behavior to show up. Can you please try out the patch and let me know if it works for your needs?
Subject: Parse-MediaWikiDump-0.96.patch
### Eclipse Workspace Patch 1.0 #P Parse-MediaWikiDump Index: lib/Parse/MediaWikiDump/Revisions.pm =================================================================== --- lib/Parse/MediaWikiDump/Revisions.pm (revision 89) +++ lib/Parse/MediaWikiDump/Revisions.pm (working copy) @@ -1,6 +1,6 @@ package Parse::MediaWikiDump::Revisions; -our $VERSION = '0.95'; +our $VERSION = '0.96'; use 5.8.0; @@ -264,7 +264,11 @@ sub validate_mediawiki_node { my ($engine, $a, $element, $attrs) = @_; - die "Only version 0.3 dump files are supported" unless $attrs->{version} eq '0.3'; + my $version = $attrs->{version}; + + if ($version ne '0.3' && $version ne '0.4') { + die "Only version 0.3 and 0.4 dump files are supported"; + } } sub save_siteinfo { Index: lib/Parse/MediaWikiDump/Pages.pm =================================================================== --- lib/Parse/MediaWikiDump/Pages.pm (revision 86) +++ lib/Parse/MediaWikiDump/Pages.pm (working copy) @@ -1,6 +1,6 @@ package Parse::MediaWikiDump::Pages; -our $VERSION = '0.94'; +our $VERSION = '0.96'; use base qw(Parse::MediaWikiDump::Revisions); @@ -69,8 +69,7 @@ } sub validate_mediawiki_node { - my ($engine, $a, $element, $attrs) = @_; - die "Only version 0.3 dump files are supported" unless $attrs->{version} eq '0.3'; + return Parse::MediaWikiDump::Revisions::validate_mediawiki_node(@_); } sub save_namespace_node { Index: lib/Parse/MediaWikiDump.pm =================================================================== --- lib/Parse/MediaWikiDump.pm (revision 89) +++ lib/Parse/MediaWikiDump.pm (working copy) @@ -1,5 +1,5 @@ package Parse::MediaWikiDump; -our $VERSION = '0.95'; +our $VERSION = '0.96'; use Parse::MediaWikiDump::XML; use Parse::MediaWikiDump::Revisions;
CC: david.m.carter [...] gmail.com
Subject: Re: [rt.cpan.org #50491] Any chance of upgrading MediaWikiDump to version 0.4?
Date: Wed, 21 Oct 2009 23:19:43 +0200
To: bug-Parse-MediaWikiDump [...] rt.cpan.org
From: Tomaž Šolc <tomaz.solc [...] tablix.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Show quoted text
> Until I get my hands on the schema and can augment the code to support the new data I've created a simple patch that > allows 0.4 version dump files with out any other changes and I've attached it to this ticket. > > I have not fully tested whether or not the changes to the dump file cause weird behavior to > show up. Can you please try out the patch and let me know if it works for your needs?
I've used a fix just like this to process the latest dump with Wikiprep and it appears to work fine. I guess the new version of the dump format only adds new tags. By the way, the new schema is available in the SVN repository of MediaWiki: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/docs/export-0.4.xsd?revision=57558&view=markup Thanks Tomaž -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkrfeu8ACgkQsAlAlRhL9q+hMgCeMFcCdYzSTgI/qAgyh2J9ey1y X20An3wbkXON+OxFtBc1/5BMeO9aJwBx =AVvp -----END PGP SIGNATURE-----
I've released version 0.96 of Parse::MediaWikiDump to CPAN which allows for format version 0.4 dump files to go through but I did not add support for the extra features, that will be done later. The release is attached to this ticket for easy access.
Download Parse-MediaWikiDump-0.96.tar.gz
application/x-gzip 16.4k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #50491] Resolved: Any chance of upgrading MediaWikiDump to version 0.4?
Date: Fri, 23 Oct 2009 11:43:16 +0100
To: bug-Parse-MediaWikiDump [...] rt.cpan.org
From: David Carter <david.m.carter [...] gmail.com>
Thank you for your quick action on this -- sorry I didn't respond earlier, but the fix sounds good. I don't think I need the extra facilities in 0.4. David On Thu, Oct 22, 2009 at 8:26 PM, Tyler Riddle via RT <bug-Parse-MediaWikiDump@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=50491 > > > According to our records, your request has been resolved. If you have any > further questions or concerns, please respond to this message. >