Skip Menu |

This queue is for tickets about the Parse-MediaWikiDump CPAN distribution.

Report information
The Basics
Id: 49979
Status: resolved
Priority: 0/
Queue: Parse-MediaWikiDump

People
Owner: triddle [...] cpan.org
Requestors: amir.aharoni [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.93
Fixed in: (no value)

Attachments


Subject: redirect in newer Wikipedia dumps
I tried to use Parse::MediaWikiDump 0.93 to process a dump of Wikipedia that was created in September 2009 (from http://download.wikimedia.org/ ). It failed with this error: "fatal error - no match for element redirect". Parse::MediaWikiDump 0.93 does work with Wikipedia dumps from July 2009. It seems to me that newer dumps have a <redirect /> element in every redirect page and older dumps do not, but i might be wrong. You can try it, for example, with the cvwiki dump, which is rather small (6MB download).
Thank you for the bug report. Until this is resolved I think you'll find that 0.92 will work ok for you. If not please let me know. Tyler
On Thu Sep 24 13:32:31 2009, TRIDDLE wrote: Show quoted text
> Thank you for the bug report. > > Until this is resolved I think you'll find that 0.92 will work ok for > you. If not please let me know.
Thanks for the very quick reply. With 0.92 i get this: fatal error - no match for element redirect at /usr/local/share/perl/5.10.0/Parse/MediaWikiDump/XML.pm line 126. (in cleanup) fatal error - no match for element mediawiki at /usr/local/share/perl/5.10.0/Parse/MediaWikiDump/XML.pm line 126.
FYI - I had the same issue, but 0.91 does not have the problem. So just download that and use it until it is fixed. Scott
I changed the parser so it allows unknown tags through and verified that cvwiki-20090924- pages-articles.xml can be run all the way through version 0.94. Thanks for the bug and update and please add additional information to this ticket if the bug remains. I just uploaded 0.94 to CPAN so it may take a few days before it's available on the mirrors. I've attached it to the ticket just for completeness.

Message body not shown because it is not plain text.