Skip Menu |

This queue is for tickets about the Parse-MediaWikiDump CPAN distribution.

Report information
The Basics
Id: 58169
Status: resolved
Priority: 0/
Queue: Parse-MediaWikiDump

People
Owner: Nobody in particular
Requestors: syed.yasin [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Parsing Wiki XML Dumps ver0.4 just got tough
Date: Sun, 6 Jun 2010 04:50:40 +0530
To: bug-Parse-MediaWikiDump [...] rt.cpan.org
From: Syed Yasin <syed.yasin [...] gmail.com>
Hello, I am trying to parse Wikipedia XML Dump using "Parse-MediaWikiDump-1.0.4" along with "Wikiprep.pl" script. I guess this script works fine with ver0.3 Wiki XML Dumps but not with the latest ver0.4 Dumps. I get the following error. Can't locate object method "page" via package "Parse::MediaWikiDump::Pages" at wikiprep.pl line 390. Also, under the "Parse-MediaWikiDump-1.0.4" documentation @ http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.4/lib/Parse/MediaWikiDump/Pages.pm, I read "LIMITATIONS Version 0.4 This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files." Any work arounds would help me get to the next level. -- Regards, Syed Yasin
The "page" method was depreciated a very long time ago and finally removed - it was replaced with the "next" method which functions identically. It seems wikiprep is out of date, this is not an issue with version 0.4 dump files or Parse::MediaWikiDump. Additionally Parse::MediaWikiDump itself has been depreciated for MediaWiki::DumpFile which offers a backwards compatible interface with twice the throughput of Parse::MediaWikiDump::Pages. Cheers, Tyler
Subject: Re: [rt.cpan.org #58169] Parsing Wiki XML Dumps ver0.4 just got tough
Date: Sun, 6 Jun 2010 05:06:45 +0530
To: bug-Parse-MediaWikiDump [...] rt.cpan.org
From: Syed Yasin <syed.yasin [...] gmail.com>
Hi thanks very much for the speedy reply, can you please highlight more information about the exact/optimum way to parse wiki as of date. My requirement is something similar to what wikiprep was designed to extract. Warm Regards, Syed Yasin On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < bug-Parse-MediaWikiDump@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 > > > The "page" method was depreciated a very long time ago and finally removed > - it was replaced > with the "next" method which functions identically. It seems wikiprep is > out of date, this is not > an issue with version 0.4 dump files or Parse::MediaWikiDump. > > Additionally Parse::MediaWikiDump itself has been depreciated for > MediaWiki::DumpFile which > offers a backwards compatible interface with twice the throughput of > Parse::MediaWikiDump::Pages. > > Cheers, > > Tyler >
-- Regards, Syed Yasin
Parse::MediaWikiDump and MediaWiki::DumpFile both have either limited or no understanding of Mediawiki internals by design; in order to generate data such as Wikiprep you would have to either just stick with Wikiprep and change all instances of page to next or recreate Wikiprep which is probably a waste of time. That being said, MediaWiki::DumpFile::Pages has the cleanest API and performs the fastest and should be used for all new projects. Cheers, Tyler On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote: Show quoted text
> Hi thanks very much for the speedy reply, can you please highlight more > information about the exact/optimum way to parse wiki as of date. My > requirement is something similar to what wikiprep was designed to extract. > > Warm Regards, > Syed Yasin > > On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < > bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 > > > > > The "page" method was depreciated a very long time ago and finally removed > > - it was replaced > > with the "next" method which functions identically. It seems wikiprep is > > out of date, this is not > > an issue with version 0.4 dump files or Parse::MediaWikiDump. > > > > Additionally Parse::MediaWikiDump itself has been depreciated for > > MediaWiki::DumpFile which > > offers a backwards compatible interface with twice the throughput of > > Parse::MediaWikiDump::Pages. > > > > Cheers, > > > > Tyler > >
> > >
Subject: Re: [rt.cpan.org #58169] Parsing Wiki XML Dumps ver0.4 just got tough
Date: Sun, 6 Jun 2010 05:52:12 +0530
To: "bug-Parse-MediaWikiDump [...] rt.cpan.org" <bug-Parse-MediaWikiDump [...] rt.cpan.org>
From: Syed Yasin <syed.yasin [...] gmail.com>
Thankyou very much again. I will try to first take a short cut by changing wikiprep accordingly. If this works out I should be very glad else will have to find an alternate solution. Warm Regards, Syed Yasin Sent from my iPhone, pls ignore typo's On Jun 6, 2010, at 5:11 AM, "Tyler Riddle via RT" <bug-Parse-MediaWikiDump@rt.cpan.org Show quoted text
> wrote:
Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 > > > Parse::MediaWikiDump and MediaWiki::DumpFile both have either > limited or no > understanding of Mediawiki internals by design; in order to generate > data such as Wikiprep > you would have to either just stick with Wikiprep and change all > instances of page to next or > recreate Wikiprep which is probably a waste of time. > > That being said, MediaWiki::DumpFile::Pages has the cleanest API and > performs the fastest > and should be used for all new projects. > > Cheers, > > Tyler > > On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote:
>> Hi thanks very much for the speedy reply, can you please highlight >> more >> information about the exact/optimum way to parse wiki as of date. My >> requirement is something similar to what wikiprep was designed to >> extract. >> >> Warm Regards, >> Syed Yasin >> >> On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < >> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >>
>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 > >>> >>> The "page" method was depreciated a very long time ago and finally >>> removed >>> - it was replaced >>> with the "next" method which functions identically. It seems >>> wikiprep is >>> out of date, this is not >>> an issue with version 0.4 dump files or Parse::MediaWikiDump. >>> >>> Additionally Parse::MediaWikiDump itself has been depreciated for >>> MediaWiki::DumpFile which >>> offers a backwards compatible interface with twice the throughput of >>> Parse::MediaWikiDump::Pages. >>> >>> Cheers, >>> >>> Tyler >>>
>> >> >>
> > >
Subject: Re: [rt.cpan.org #58169] Parsing Wiki XML Dumps ver0.4 just got tough
Date: Sun, 6 Jun 2010 16:33:29 +0530
To: "bug-Parse-MediaWikiDump [...] rt.cpan.org" <bug-Parse-MediaWikiDump [...] rt.cpan.org>
From: Syed Yasin <syed.yasin [...] gmail.com>
Hi Tyler, Greetings! Your inputs were of immense help, I am glad the issue is resolved - Thanks again! Warm Regards, Syed Yasin On Sun, Jun 6, 2010 at 5:52 AM, Syed Yasin <syed.yasin@gmail.com> wrote: Show quoted text
> Thankyou very much again. I will try to first take a short cut by changing > wikiprep accordingly. If this works out I should be very glad else will have > to find an alternate solution. > > Warm Regards, > Syed Yasin > Sent from my iPhone, pls ignore typo's > > > On Jun 6, 2010, at 5:11 AM, "Tyler Riddle via RT" < > bug-Parse-MediaWikiDump@rt.cpan.org> wrote: > > <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>> >> Parse::MediaWikiDump and MediaWiki::DumpFile both have either limited or >> no >> understanding of Mediawiki internals by design; in order to generate data >> such as Wikiprep >> you would have to either just stick with Wikiprep and change all instances >> of page to next or >> recreate Wikiprep which is probably a waste of time. >> >> That being said, MediaWiki::DumpFile::Pages has the cleanest API and >> performs the fastest >> and should be used for all new projects. >> >> Cheers, >> >> Tyler >> >> On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote: >>
>>> Hi thanks very much for the speedy reply, can you please highlight more >>> information about the exact/optimum way to parse wiki as of date. My >>> requirement is something similar to what wikiprep was designed to >>> extract. >>> >>> Warm Regards, >>> Syed Yasin >>> >>> On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < >>> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >>> >>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>>>> >>>> The "page" method was depreciated a very long time ago and finally >>>> removed >>>> - it was replaced >>>> with the "next" method which functions identically. It seems wikiprep is >>>> out of date, this is not >>>> an issue with version 0.4 dump files or Parse::MediaWikiDump. >>>> >>>> Additionally Parse::MediaWikiDump itself has been depreciated for >>>> MediaWiki::DumpFile which >>>> offers a backwards compatible interface with twice the throughput of >>>> Parse::MediaWikiDump::Pages. >>>> >>>> Cheers, >>>> >>>> Tyler >>>> >>>>
>>> >>> >>>
>> >> >>
-- Regards, Syed Yasin
Subject: Re: [rt.cpan.org #58169] Parsing Wiki XML Dumps ver0.4 just got tough
Date: Wed, 9 Jun 2010 21:46:51 +0530
To: "bug-Parse-MediaWikiDump [...] rt.cpan.org" <bug-Parse-MediaWikiDump [...] rt.cpan.org>
From: Syed Yasin <syed.yasin [...] gmail.com>
Hi Tyler, Greetings! What are the changes to identify "Categories" in the current version. I am using "Parse::MediaWikiDump" along with modified wikiprep as told. My requirement is to only process "Medical/Health" Related articles. Warm Regards, Syed Yasin On Sun, Jun 6, 2010 at 4:33 PM, Syed Yasin <syed.yasin@gmail.com> wrote: Show quoted text
> Hi Tyler, Greetings! > > Your inputs were of immense help, I am glad the issue is resolved - Thanks > again! > > Warm Regards, > Syed Yasin > > On Sun, Jun 6, 2010 at 5:52 AM, Syed Yasin <syed.yasin@gmail.com> wrote: >
>> Thankyou very much again. I will try to first take a short cut by changing >> wikiprep accordingly. If this works out I should be very glad else will have >> to find an alternate solution. >> >> Warm Regards, >> Syed Yasin >> Sent from my iPhone, pls ignore typo's >> >> >> On Jun 6, 2010, at 5:11 AM, "Tyler Riddle via RT" < >> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >> >> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>>> >>> Parse::MediaWikiDump and MediaWiki::DumpFile both have either limited or >>> no >>> understanding of Mediawiki internals by design; in order to generate data >>> such as Wikiprep >>> you would have to either just stick with Wikiprep and change all >>> instances of page to next or >>> recreate Wikiprep which is probably a waste of time. >>> >>> That being said, MediaWiki::DumpFile::Pages has the cleanest API and >>> performs the fastest >>> and should be used for all new projects. >>> >>> Cheers, >>> >>> Tyler >>> >>> On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote: >>>
>>>> Hi thanks very much for the speedy reply, can you please highlight more >>>> information about the exact/optimum way to parse wiki as of date. My >>>> requirement is something similar to what wikiprep was designed to >>>> extract. >>>> >>>> Warm Regards, >>>> Syed Yasin >>>> >>>> On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < >>>> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >>>> >>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>>>>> >>>>> The "page" method was depreciated a very long time ago and finally >>>>> removed >>>>> - it was replaced >>>>> with the "next" method which functions identically. It seems wikiprep >>>>> is >>>>> out of date, this is not >>>>> an issue with version 0.4 dump files or Parse::MediaWikiDump. >>>>> >>>>> Additionally Parse::MediaWikiDump itself has been depreciated for >>>>> MediaWiki::DumpFile which >>>>> offers a backwards compatible interface with twice the throughput of >>>>> Parse::MediaWikiDump::Pages. >>>>> >>>>> Cheers, >>>>> >>>>> Tyler >>>>> >>>>>
>>>> >>>> >>>>
>>> >>> >>>
> > > -- > Regards, > Syed Yasin >
-- Regards, Syed Yasin
There is no support for categories, as well the Wikipedia category graph is a nightmare, which is specifically why I removed support from MediaWiki::DumpFile. This has not been solved by anyone afaik. Please also understand that the ticket system is for bugs related to my software, not developer support. Cheers, Tyler On Wed Jun 09 12:17:01 2010, syed.yasin@gmail.com wrote: Show quoted text
> Hi Tyler, Greetings! > > What are the changes to identify "Categories" in the current version. > I am > using "Parse::MediaWikiDump" along with modified wikiprep as told. > > My requirement is to only process "Medical/Health" Related articles. > > Warm Regards, > Syed Yasin > > On Sun, Jun 6, 2010 at 4:33 PM, Syed Yasin <syed.yasin@gmail.com> > wrote: >
> > Hi Tyler, Greetings! > > > > Your inputs were of immense help, I am glad the issue is resolved -
> Thanks
> > again! > > > > Warm Regards, > > Syed Yasin > > > > On Sun, Jun 6, 2010 at 5:52 AM, Syed Yasin <syed.yasin@gmail.com>
> wrote:
> >
> >> Thankyou very much again. I will try to first take a short cut by
> changing
> >> wikiprep accordingly. If this works out I should be very glad else
> will have
> >> to find an alternate solution. > >> > >> Warm Regards, > >> Syed Yasin > >> Sent from my iPhone, pls ignore typo's > >> > >> > >> On Jun 6, 2010, at 5:11 AM, "Tyler Riddle via RT" < > >> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: > >> > >> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
> >>> > >>> Parse::MediaWikiDump and MediaWiki::DumpFile both have either
> limited or
> >>> no > >>> understanding of Mediawiki internals by design; in order to
> generate data
> >>> such as Wikiprep > >>> you would have to either just stick with Wikiprep and change all > >>> instances of page to next or > >>> recreate Wikiprep which is probably a waste of time. > >>> > >>> That being said, MediaWiki::DumpFile::Pages has the cleanest API
> and
> >>> performs the fastest > >>> and should be used for all new projects. > >>> > >>> Cheers, > >>> > >>> Tyler > >>> > >>> On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote: > >>>
> >>>> Hi thanks very much for the speedy reply, can you please
> highlight more
> >>>> information about the exact/optimum way to parse wiki as of date.
> My
> >>>> requirement is something similar to what wikiprep was designed to > >>>> extract. > >>>> > >>>> Warm Regards, > >>>> Syed Yasin > >>>> > >>>> On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < > >>>> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: > >>>> > >>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
> >>>>> > >>>>> The "page" method was depreciated a very long time ago and
> finally
> >>>>> removed > >>>>> - it was replaced > >>>>> with the "next" method which functions identically. It seems
> wikiprep
> >>>>> is > >>>>> out of date, this is not > >>>>> an issue with version 0.4 dump files or Parse::MediaWikiDump. > >>>>> > >>>>> Additionally Parse::MediaWikiDump itself has been depreciated
> for
> >>>>> MediaWiki::DumpFile which > >>>>> offers a backwards compatible interface with twice the
> throughput of
> >>>>> Parse::MediaWikiDump::Pages. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Tyler > >>>>> > >>>>>
> >>>> > >>>> > >>>>
> >>> > >>> > >>>
> > > > > > -- > > Regards, > > Syed Yasin > >
> > >
Subject: Re: [rt.cpan.org #58169] Parsing Wiki XML Dumps ver0.4 just got tough
Date: Thu, 10 Jun 2010 06:35:14 +0530
To: "bug-Parse-MediaWikiDump [...] rt.cpan.org" <bug-Parse-MediaWikiDump [...] rt.cpan.org>
From: Syed Yasin <syed.yasin [...] gmail.com>
Thanks for the prompt response, I truly appreciate it. I presumed there could be some Function to accomodate this. Anyways, I will try to figure out. In fact, when I assigned $catName, instead of $catId to the $refToCategory, catgories are fetched but there is a type mismatch error @ removeDuplicates function (line 1566 at wikiprep)... although the program seems to execute fine. Thanks again! Warm Regards, Syed Yasin Sent from my iPhone, pls ignore typo's On Jun 10, 2010, at 12:09 AM, "Tyler Riddle via RT" <bug-Parse-MediaWikiDump@rt.cpan.org Show quoted text
> wrote:
Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 > > > There is no support for categories, as well the Wikipedia category > graph is a nightmare, > which is specifically why I removed support from > MediaWiki::DumpFile. This has not been > solved by anyone afaik. > > Please also understand that the ticket system is for bugs related to > my software, not > developer support. > > Cheers, > > Tyler > > On Wed Jun 09 12:17:01 2010, syed.yasin@gmail.com wrote:
>> Hi Tyler, Greetings! >> >> What are the changes to identify "Categories" in the current version. >> I am >> using "Parse::MediaWikiDump" along with modified wikiprep as told. >> >> My requirement is to only process "Medical/Health" Related articles. >> >> Warm Regards, >> Syed Yasin >> >> On Sun, Jun 6, 2010 at 4:33 PM, Syed Yasin <syed.yasin@gmail.com> >> wrote: >>
>>> Hi Tyler, Greetings! >>> >>> Your inputs were of immense help, I am glad the issue is resolved -
>> Thanks
>>> again! >>> >>> Warm Regards, >>> Syed Yasin >>> >>> On Sun, Jun 6, 2010 at 5:52 AM, Syed Yasin <syed.yasin@gmail.com>
>> wrote:
>>>
>>>> Thankyou very much again. I will try to first take a short cut by
>> changing
>>>> wikiprep accordingly. If this works out I should be very glad else
>> will have
>>>> to find an alternate solution. >>>> >>>> Warm Regards, >>>> Syed Yasin >>>> Sent from my iPhone, pls ignore typo's >>>> >>>> >>>> On Jun 6, 2010, at 5:11 AM, "Tyler Riddle via RT" < >>>> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >>>> >>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>>>>> >>>>> Parse::MediaWikiDump and MediaWiki::DumpFile both have either
>> limited or
>>>>> no >>>>> understanding of Mediawiki internals by design; in order to
>> generate data
>>>>> such as Wikiprep >>>>> you would have to either just stick with Wikiprep and change all >>>>> instances of page to next or >>>>> recreate Wikiprep which is probably a waste of time. >>>>> >>>>> That being said, MediaWiki::DumpFile::Pages has the cleanest API
>> and
>>>>> performs the fastest >>>>> and should be used for all new projects. >>>>> >>>>> Cheers, >>>>> >>>>> Tyler >>>>> >>>>> On Sat Jun 05 19:36:54 2010, syed.yasin@gmail.com wrote: >>>>>
>>>>>> Hi thanks very much for the speedy reply, can you please
>> highlight more
>>>>>> information about the exact/optimum way to parse wiki as of date.
>> My
>>>>>> requirement is something similar to what wikiprep was designed to >>>>>> extract. >>>>>> >>>>>> Warm Regards, >>>>>> Syed Yasin >>>>>> >>>>>> On Sun, Jun 6, 2010 at 5:00 AM, Tyler Riddle via RT < >>>>>> bug-Parse-MediaWikiDump@rt.cpan.org> wrote: >>>>>> >>>>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=58169 >
>>>>>>> >>>>>>> The "page" method was depreciated a very long time ago and
>> finally
>>>>>>> removed >>>>>>> - it was replaced >>>>>>> with the "next" method which functions identically. It seems
>> wikiprep
>>>>>>> is >>>>>>> out of date, this is not >>>>>>> an issue with version 0.4 dump files or Parse::MediaWikiDump. >>>>>>> >>>>>>> Additionally Parse::MediaWikiDump itself has been depreciated
>> for
>>>>>>> MediaWiki::DumpFile which >>>>>>> offers a backwards compatible interface with twice the
>> throughput of
>>>>>>> Parse::MediaWikiDump::Pages. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Tyler >>>>>>> >>>>>>>
>>>>>> >>>>>> >>>>>>
>>>>> >>>>> >>>>>
>>> >>> >>> -- >>> Regards, >>> Syed Yasin >>>
>> >> >>
> > >