Skip Menu |

This queue is for tickets about the Parse-MediaWikiDump CPAN distribution.

Report information
The Basics
Id: 16583
Status: resolved
Priority: 0/
Queue: Parse-MediaWikiDump

People
Owner: triddle [...] cpan.org
Requestors: jmrukkers [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.22
Fixed in: (no value)

Attachments


Subject: Parse-MediaWikiDump - dies parsing german xml dump file
Hi, I found what is probably a bug in parsing through the german wikipedia xml dump - the MediaWikiDump example program dies on an empty username in the dump file, this could probably just be ignored and processing could continue.
[guest - Sun Dec 18 16:08:02 2005]: Show quoted text
> Hi, I found what is probably a bug in parsing through the german > wikipedia xml dump - the MediaWikiDump example program dies on an > empty username in the dump file, this could probably just be > ignored and processing could continue.
Thanks for the bug report. This is an interesting case... It looks like the username in this case is the unicode value for a space which is causing the underlying XML parser to miss the value in the username field. I'm not sure what the proper method to resolve this is. I'll have to contact the MediaWiki developers to verify this is not a bug in MediaWiki first because the only solution I can think of to the problem can lead to behavior which will not be consistent with the underlying MediaWiki data in the XML file. Tyler Riddle
Hello again, I filed a bug report (I'm quite confident MediaWiki should not let a username of all spaces exist) and I found another workaround for the time being. I was able to use the example program that extracts articles to extract the article text to Negativ-Positiv Verfahren which was the article that was causing the module to die. Would you please check out the attached version of the module? If it works for you I will publish that as the next version of Parse::MediaWikiDump. Thanks again for your bug report, Tyler Riddle
Download Parse-MediaWikiDump-0.23.tar.gz
application/x-gzip 11.5k

Message body not shown because it is not plain text.

From: Johannes Rukkers
Hello Tyler, I tested the new version and it workes great, thank you so much for creating such an excellent and useful module, and thank you for the quick response and the fixing of this issue in such a short time. Regards -- Johannes Rukkers