Subject: | endless loop when first server in list is down |
When pushing a list of servers to sync:
@DC1_servers =("web121","web122","web123");
If web121 is down, it goes into an endless loop, because the 'source'
server is still 'localhost', and the code below kicks it out without
going to the next:
sub _mark_available {
my ( $self, $group, $server ) = @_;
# don't reschedule localhost for future syncs
return if $server eq "localhost";
$logger->debug( "Server available: ($group) $server" );
unshift @{ $data_of{ident $self}->{available}->{ $group } },
$server;
}
The script loops, because it's stuck here, since it hasn't successfully
made an initial copy to the first server:
2008/10/29 22:25:58 Starting: (datacenter1) localhost => web121
2008/10/29 22:25:58 Failed: (datacenter1) localhost => web121
2008/10/29 22:20:56 DATACENTER1: completed:0 running:0 left:3 errors:1
failures:0
...
...
hangs
So, if you add this little code snippet, it will skip over the bad
server, and successfully complete to the other 2 in the list, returning
to your main script and continuing on as normal.
sub _mark_available {
my ( $self, $group, $server ) = @_;
# completed procs
my $completed = 0;
if ( $data_of{ident $self}->{completed}->{ $group } ) {
$completed = scalar @{ $data_of{ident $self}->{completed}->{
$group } };
# If we have seen one completion
if ( $completed > 0 ) {
# don't reschedule localhost for future syncs
return if $server eq "localhost";
} # Otherwise, we haven't had a success, and if don't
continue we will endles loop...
}
$logger->debug( "Server available: ($group) $server" );
unshift @{ $data_of{ident $self}->{available}->{ $group } },
$server;
}
2008/10/29 22:29:25 Starting: (datacenter1) localhost => web121
2008/10/29 22:29:25 Failed: (datacenter1) localhost => web121
2008/10/29 22:29:25 DATACENTER1: completed:0 running:0 left:3 errors:1
failures:0
2008/10/29 22:29:26 Succeeded: (datacenter1) localhost => web122
2008/10/29 22:29:26 DATACENTER1: completed:1 running:0 left:2 errors:1
failures:0
2008/10/29 22:29:26 Failed: (datacenter1) web122 => web121
2008/10/29 22:29:26 DATACENTER1: completed:1 running:1 left:1 errors:2
failures:0
2008/10/29 22:29:26 Failed: (datacenter1) web122 => web121
2008/10/29 22:29:26 Error: giving up on (datacenter1) web121
2008/10/29 22:29:26 DATACENTER1: completed:1 running:1 left:0 errors:3
failures:1
2008/10/29 22:29:26 Succeeded: (datacenter1) web122 => web123
2008/10/29 22:29:26 DATACENTER1: completed:2 running:0 left:0 errors:3
failures:1
2008/10/29 22:29:26 Job completed ...