Skip Menu |

This queue is for tickets about the MediaWiki-API CPAN distribution.

Report information
The Basics
Id: 40308
Status: resolved
Priority: 0/
Queue: MediaWiki-API

People
Owner: Nobody in particular
Requestors: steve.sanbeg [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.16
Fixed in: (no value)



Subject: rollback is dangerously broken
I think this problem is more difficult to describe than to fix, but here goes. Rollback is a quick way of removing vandalism; you just specify the page and the user, and that user's edits are reverted, if they still have the top edit. Currently, the module silently replaced the specified user with the last editor to the page, which defeats the safety check. This means the rollback would always happen, but it's as likely to replace vandalism that was reverted as it is to revert it, pretty much defeating the purpose. I'll mark the severity high, since someone who tries to use this function could get blocked for misuse of it due to this problem. There's a similar problem with the timestamp for the first edit. The script should keep the timestamp when the page is retrieved, and pass it back when saving, so the API can detect edit conflicts, but the module also replaces that with the most recent available timestamp at the time of the submit, thus disabling any conflicts that could happen during the processing. This is only done for the first edit, which is the main reason I removed that part entirely, so it could be more consistent. Neither of these overwritten parameters was documented; I only noticed it when I found that a required parameter wasn't used in the rollback example. Since the documentation is somewhat thin as most parameters are simply passed through to the server, it may be better to document exceptions to that.
Subject: rollback2.patch
--- /usr/local/lib/perl5/site_perl/5.10.0/MediaWiki/API.pm 2008-10-16 16:52:28.000000000 -0400 +++ MediaWiki/API.pm 2008-10-23 12:53:47.000000000 -0400 @@ -364,7 +364,7 @@ # rollback a page edit $mw->edit( { - action => 'rollback', title => 'Sandbox' } ) + action => 'rollback', title => 'Sandbox', user => 'Vandal' } ) || die $mw->{error}->{code} . ': ' . $mw->{error}->{details}; =cut @@ -663,16 +663,10 @@ if ( $action eq 'rollback' ) { $query->{token} = @{ $pageref->{revisions} }[0]->{$action.'token'}; - $query->{user} = @{ $pageref->{revisions} }[0]->{user}; } else { $query->{token} = $pageref->{$action.'token'}; } - # need timestamp of last revision for edits to avoid edit conflicts (if there are previous revisions) - if ( $action eq 'edit' && defined $pageref->{revisions} ) { - $query->{basetimestamp} = @{ $pageref->{revisions} }[0]->{timestamp}; - } - return $self->_error( ERR_EDIT, 'Unable to get an edit token ($page).' ) unless ( defined $query->{token} ); # cache the token. rollback tokens are specific for the page name and last edited user so can not be cached.
I will look into this and your patch. I think I may have misunderstood the mediawiki api documentation (which is limited). For example it says "To edit a page, an edit token is required. This token is the same for all pages, but changes at every login. If you want to protect against edit conflicts (which is wise), you also need to get the timestamp of the last revision. You can obtain these as follows:" which I misunderstood I guess. So for an edit, we need some kind of call to /getset the timestamp along with the page contents, after which we do the edit. I'll see what I can do. The rollback stuff was partly untested, and I guess I misunderstood their documentation again. The documentation said "user: The author of the last revision." so I got the last revision. So you mean it should rollback a specific requested user instead? makes sense I guess. I'll need to experiment with this a bit
Show quoted text
>This is only done for the first edit, which is the main >reason I removed that part entirely, so it could be more consistent.
Yeh this was a bug anyway despite it seems badly implemented. :/
I could leave the timestamp handling out like you have said. the person using my script can then handle it themselves. But some helper functions might be useful. I apologise for any inconvenience these problems might have caused you. I have never really used the rollback function and Just did some quick tests. And for editing, the wiki I use my module on is too quiet to have edit conflicts..
Subject: Re: [rt.cpan.org #40308] rollback is dangerously broken
Date: Thu, 23 Oct 2008 14:59:10 -0400
To: bug-MediaWiki-API [...] rt.cpan.org
From: "Steve Sanbeg" <steve.sanbeg [...] gmail.com>
On Thu, Oct 23, 2008 at 1:39 PM, Jools Smyth via RT < bug-MediaWiki-API@rt.cpan.org> wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=40308 > > > I could leave the timestamp handling out like you have said. the person > using my script can then handle it themselves. But some helper functions > might be useful. >
For the bots I've been working on lately, I've been writing a higher level framework, which is basically just helper functions that call yours, but simplify the API somewhat for common functionality. In my framework, the method to get a page returns a blessed reference which stores the contents & timestamp, so they can be used when the page is saved. Although I started it partly to simplify porting a script from a different framework, it seems to work pretty well for what I'm doing, to have a lower-level complete API, and a higher level more convenient layer http://mediawiki-tools.cvs.sourceforge.net/viewvc/mediawiki-tools/perl/FrameworkAPI.pm?view=log That's still a work in progress, though. Maybe there is a better way to structure that. I think the idea of the timestamp is to see if anyone else edited the page since you started. If not, the new content can simply replace the old. If so, then the software can try and merge the edits, or raise an edit conflict if it can't. So assuming the operation is to get text, modify it, and send it back, the timestamp should match the text that was retrieved. Show quoted text
> > I apologise for any inconvenience these problems might have caused you. > I have never really used the rollback function and Just did some quick > tests. And for editing, the wiki I use my module on is too quiet to have > edit conflicts.. > >
No problem. I was just thinking that I could write a script that would do rollback, but since the documentation didn't look right, I read the code and realized it wouldn't work. So no harm done. The classic example of a simple rollback script would be to discover someone messed up a lot of pages, and run a script that would parse their contributions and roll back all of their edits. If someone else already reverted an edit, then there's nothing to roll back, and it should do nothing on that page. But if the user parameter is overwritten this way, the revert would get rolled back, thus restoring the vandalism.
Please give this new version a try http://malus.exotica.org.uk/~buzz/cpan/MediaWiki-API-0.17.tar.gz Now when you specify a user for the rollback, if the last edit wasn't from that user an error is returned (functions returns undef - and can be dealt with as you wish). If no user was given, it will rollback whoever was the last editor. I'm not sure this is useful, but there might be some case for it ? My alternatives would have been to return an error. I have removed that broken timestamp code, and have changed the edit example showing how to pass a timestamp. Hopefully this will work. let me know how you get on. If this seems to work for you, I will make it a release. Maybe I should incorporate a easy to use page object, but as you and others are doing higher level frameworks, perhaps I should just concentrate on the communication with the api part. I will check out your code. Show quoted text
> That's still a work in progress, though. Maybe there is a better way > to > structure that.
I'll have a read. Show quoted text
> If > so, then the software can try and merge the edits, or raise an edit > conflict > if it can't. So assuming the operation is to get text, modify it, and > send > it back, the timestamp should match the text that was retrieved.
from what I have read today on the mailinglist, it seems it might always return a conflict if there is a newer revision timestamp than the one you have, so perhaps it doesn't even try to merge via the api ? (i could have misunderstood though)
Subject: Re: [rt.cpan.org #40308] rollback is dangerously broken
Date: Fri, 24 Oct 2008 19:08:59 -0400
To: bug-MediaWiki-API [...] rt.cpan.org
From: "Steve Sanbeg" <steve.sanbeg [...] gmail.com>
On Thu, Oct 23, 2008 at 6:39 PM, Jools Smyth via RT < bug-MediaWiki-API@rt.cpan.org> wrote: Show quoted text
OK, I'll take a look at it. The primary bot I'm working on now is one that runs once a month to archive a discussion page, so that wouldn't be a thorough test, but I'll try this out. Show quoted text
> > Now when you specify a user for the rollback, if the last edit wasn't > from that user an error is returned (functions returns undef - and can > be dealt with as you wish). If no user was given, it will rollback > whoever was the last editor. I'm not sure this is useful, but there > might be some case for it ? My alternatives would have been to return an > error. >
My thinking was that forgetting to pass the user would be unusual enough that it should trigger an error, although if there is a use to fill it in automatically, maybe passing some special username may work; i.e. user=>'*' would trigger a replacement,. Returning undef seems reasonable; mostly it wouldn't matter whether the revert happened by you or someone else so that can be ignored, but it could be useful to see which rollbacks resulted in some action. Show quoted text
> > I have removed that broken timestamp code, and have changed the edit > example showing how to pass a timestamp. Hopefully this will work. > > let me know how you get on. If this seems to work for you, I will make > it a release. > > Maybe I should incorporate a easy to use page object, but as you and > others are doing higher level frameworks, perhaps I should just > concentrate on the communication with the api part. I will check out > your code. >
What I'm trying to do with my framework is to create a simple api, similar to a classic pywikipedia style, without worrying too much about the communication with the api. Some of what I'm working on, i.e. logins that are reliable despite the throttled login method & expiring cookies, or the built in test mode, may be out of the scope of what you want to do. It is useful, but I'm not sure whether or not some of that could be merged in without making things too confusing. Show quoted text
>
> > That's still a work in progress, though. Maybe there is a better way > > to > > structure that.
> > I'll have a read. >
> > If > > so, then the software can try and merge the edits, or raise an edit > > conflict > > if it can't. So assuming the operation is to get text, modify it, and > > send > > it back, the timestamp should match the text that was retrieved.
> > from what I have read today on the mailinglist, it seems it might always > return a conflict if there is a newer revision timestamp than the one > you have, so perhaps it doesn't even try to merge via the api ? > (i could have misunderstood though) >
I just found that discussion on the web archive. It looks like there was eventually a more detailed description of the merge, so it seems like it does happen, but since it doesn't usually work (unless you use section=new) it doesn't get much emphasis.
I am marking this as resolved as the problems are fixed in the latest 0.17 upload which will appear shortly. If you have any problems with the new version please do let me know via this bugtracker. Thanks!