Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 55436
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: LWP::UserAgent::mirror is not atomic
The quite old commit c99870f0b619684694cea2b0a0d998080e8cb3b0 introduced a change which makes mirror() no longer atomic, because an unlink() is done first, and then the rename() follows. Since perldoc perlport does not mention a limitation for rename() (other than the probably non relevant limitation on MSWin32 about moving directories), I suggest to disable the block containing unlink() completely. See the attached patch. Regards, Slaven
Subject: LWP-mirror-unlink.patch
diff --git a/lib/LWP/UserAgent.pm b/lib/LWP/UserAgent.pm index abec3f6..798fa38 100644 --- a/lib/LWP/UserAgent.pm +++ b/lib/LWP/UserAgent.pm @@ -863,7 +863,7 @@ sub mirror # The file was the expected length. else { # Replace the stale file with a fresh copy - if ( -e $file ) { + if ( 0 ) { # Some dosish systems fail to rename if the target exists chmod 0777, $file; unlink $file;
Attached is a script which might reproduce the problem by printing "!!! ... does not exist in ...". On a CentOS5 system I am able to reproduce the problem on every run. On a Debian system the problem it is very rare to reproduce. So probably depends on OS, hardware, harddisk etc. strace may also show the race condition: 32598 chmod("/tmp/CByEwZA2fp/dest", 0777 <unfinished ...> 32597 unlink("/tmp/CByEwZA2fp/dest-32597" <unfinished ...> 32598 <... chmod resumed> ) = 0 32597 <... unlink resumed> ) = -1 ENOENT (No such file or directory) 32598 unlink("/tmp/CByEwZA2fp/dest" <unfinished ...> 32597 stat("/tmp/CByEwZA2fp/dest", 0x15385140) = -1 ENOENT (No such file or directory) 32598 <... unlink resumed> ) = 0 32597 write(2, "!!! /tmp/CByEwZA2fp/dest does no"..., 76 <unfinished ...> 32598 rename("/tmp/CByEwZA2fp/dest-32598", "/tmp/CByEwZA2fp/dest" <unfinished ...> Regards, Slaven
Subject: mirror.pl
#!/usr/bin/perl use strict; use warnings; our $VERSION = 0.001; use File::Temp qw(tempdir); use LWP::UserAgent; my $tmpdir = tempdir(TMPDIR => 1, CLEANUP => 1) or die $!; my $srcfile = "$tmpdir/src"; open my $fh, ">", $srcfile or die $!; print $fh "foobar\n"; close $fh or die $!; my $destfile = "$tmpdir/dest"; my $ua = get_ua(); my $resp = $ua->mirror("file://$srcfile", $destfile); warn $resp->as_string; my $parent_pid = $$; my $end_time = time + 15; my $fork = 30; if ($fork && $fork > 1) { for (2..$fork) { my $pid = fork; if (!defined $pid) { die "Could not fork: $!"; } elsif ($pid) { warn "Forked process $pid...\n"; last; # hehe... otherwise we built a fork bomb... } else { $File::Temp::KEEP_ALL = 1; # only the parent should deal with tempdir cleanup # have to create a new User-Agent, to avoid keep-alive problems! $ua = get_ua(); } } } while(time < $end_time) { utime time, time, $srcfile; $ua->mirror("file://$srcfile", $destfile); warn "!!! $destfile does not exist in $$" if !-e $destfile; #warn "$$ OK $destfile\n"; } sub get_ua { LWP::UserAgent->new } END { warn "$$: Sleep before end...\n"; sleep 2; }
From: tlhackque [...] yahoo.com
I agree that this is a serious bug. Mirrors often do want to be atomic. If Perl isn't allowing the rename, we can argue that it's a Perl bug. The Camel book says that 'if the destination file exists, it will be destroyed. Non-unix systems may have other restrictions'. So others might argue that 'other restrictions' includes 'target can't exist'. I'd like LWP::UserAgent::mirror() to be fixed. Here are some choices to make everyone happy: 1) If the change targeted a specific system, make it conditional on $^O - and document that under that/those OS(s), mirror isn't atomic. I think this is the best choice. 2) Try the rename. If it fails with an errno EEXISTS, unlink and retry. 3) The client can flock (exclusive) a related file (e.g. "$target.lock") - which may be a good idea to prevent multiple simultaneous mirror()s. Of course this requires that all users of the file play nice and get shared locks for reading the mirrored file... However, flock()ing has other issues: most systems don't support network disks. And there's the question of whether to grab the exclusive lock before or after the network transfer completes. Assuming many readers and sane update policy - probably after, as this minimizes impact on the readers. But if many might try to update and the network is slow (relative to file size), before - this prevents pointless downloads. Client locking is required if there can be multiple updaters. But if there's only one, LWP used to be safe. It isn't now. And that's a regression bug.