Skip Menu |

This queue is for tickets about the Net-SSH-Perl CPAN distribution.

Report information
The Basics
Id: 55195
Status: resolved
Priority: 0/
Queue: Net-SSH-Perl

People
Owner: schwigon [...] cpan.org
Requestors: erik-bitcard [...] planeterik.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.34
Fixed in: (no value)



Subject: Race condition in KEXINIT
During the KEXINIT process, if the server's initial Key Exchange negotiation packet is received before the Net::SSH::Perl client sends the TCP ACK packet for the Protocol Version Exchange, the negotiation packet is ignored and the connection process hangs. For example, a "good" connection looks something like this: CLIENT --> SERVER --- SYN CLIENT <-- SERVER --- SYN ACK CLIENT --> SERVER --- ACK CLIENT <-- SERVER --- Server Protocol Version Packet CLIENT --> SERVER --- ACK CLIENT <-- SERVER --- KEXINIT proposal from server CLIENT --> SERVER --- ACK CLIENT --> SERVER --- Client Protocol Version Packet CLIENT <-- SERVER --- ACK CLIENT --> SERVER --- KEXINIT proposal from client (and sometimes DH Exchange Init) CLIENT <-- SERVER --- ACK (DH negotiation proceeds, etc.) Here's an example of a sequence that will create a hang: CLIENT --> SERVER --- SYN CLIENT <-- SERVER --- SYN ACK CLIENT --> SERVER --- ACK CLIENT <-- SERVER --- Server Protocol Version Packet CLIENT <-- SERVER --- KEXINIT proposal from server CLIENT --> SERVER --- ACK CLIENT --> SERVER --- ACK CLIENT --> SERVER --- Client Protocol Version Packet CLIENT <-- SERVER --- ACK CLIENT --> SERVER --- KEXINIT proposal from client CLIENT <-- SERVER --- ACK (process hangs waiting for the Server's KEXINIT proposal) The specific system I'm running is Perl 5.8.8 for x86_64-linux-gnu-thread-multi running on Ubuntu 8.04 LTS (kernel 2.6.24-26-server) on VMWare ESXi 4. The virtualization may be what's slowing down the script enough so that the race condition occurs; I would imagine that in other circumstances it'd be pretty hard to trigger this bug.
From: erik-bitcard [...] planeterik.com
After further analysis, I've found an instance where the SSH version packet was ACK'd and the connect still hung, but it still appears to be a race condition. I see in the source how the SSH version string is handled through the standard Perl streaming I/O functions and the packet processing is then handed off to a buffered packet processor, but apparently my Perl-Fu is not strong enough to figure out why the server's KEXINIT proposal packet is being lost. I strongly suspect that there's an issue with the hand-off to the buffered packet processor, but I don't seem to be able to find and fix it myself. I've worked around this with a watchdog timer, but if somebody with superior skills could come up with a solution I'd love to not just implement it, but see how it works.
From: caseylucas [...] gmail.com
On Thu Mar 04 13:25:15 2010, ecarlseen wrote: Show quoted text
> After further analysis, I've found an instance where the SSH version > packet was ACK'd and the connect still hung, but it still appears to
be Show quoted text
> a race condition. I see in the source how the SSH version string is > handled through the standard Perl streaming I/O functions and the
packet Show quoted text
> processing is then handed off to a buffered packet processor, but > apparently my Perl-Fu is not strong enough to figure out why the > server's KEXINIT proposal packet is being lost. I strongly suspect
that Show quoted text
> there's an issue with the hand-off to the buffered packet processor,
but Show quoted text
> I don't seem to be able to find and fix it myself. I've worked around > this with a watchdog timer, but if somebody with superior skills could > come up with a solution I'd love to not just implement it, but see how > it works. >
I believe I had this same problem. It is not really an ACK issue but incorrect reads from the socket. Also, it is before the KEXINIT that the real problem actually occurs. It's in the login when the initial server message is read. The code in 1.30 read one byte at a time from the original server message and stopped when it got to the first \n. The 1.34 code uses a <SOCKET> read to get a line, but the whole buffer (which contains additional data from the server) is read in this case... which causes subsequent socket reads to be off. Anyway, this is the diff which fixed it for me in 1.34 (of Net/SSH/Perl.pm). It essentially just reverts the relevant code to the 1.30 read one byte at a time version.
Subject: Perl.pm.patch
--- /usr/local/lib/perl5/site_perl/5.10.0/Net/SSH/Perl.pm Sun Feb 1 19:18:27 2009 +++ 5.10.0/Net/SSH/Perl.pm Tue Aug 3 16:53:22 2010 @@ -10,6 +10,8 @@ use Net::SSH::Perl::Cipher; use Net::SSH::Perl::Util qw( :hosts _read_yes_or_no ); +use Errno qw( EAGAIN EWOULDBLOCK ); + use vars qw( $VERSION $CONFIG $HOSTNAME ); $CONFIG = {}; @@ -255,12 +257,44 @@ croak @_; } +sub _read_version_line { + my $ssh = shift; + my $sock = $ssh->{session}{sock}; + my $line; + for(;;) { + my $s = IO::Select->new($sock); + my @ready = $s->can_read; + my $buf; + my $len = sysread($sock, $buf, 1); + unless(defined($len)) { + next if $! == EAGAIN || $! == EWOULDBLOCK; + croak "Read from socket failed: $!"; + } + croak "Connection closed by remote host" if $len == 0; + $line .= $buf; + croak "Version line too long: $line" + if substr($line, 0, 4) eq "SSH-" and length($line) > 255; + croak "Pre-version line too long: $line" if length($line) > 4*1024; + return $line if $buf eq "\n"; + } +} + +sub _read_version { + my $ssh = shift; + my $line; + do { + $line = $ssh->_read_version_line; + } while (substr($line, 0, 4) ne "SSH-"); + $ssh->debug("Remote version string: $line"); + return $line; +} + sub sock { $_[0]->{session}{sock} } sub _exchange_identification { my $ssh = shift; my $sock = $ssh->{session}{sock}; - my $remote_id = <$sock>; + my $remote_id = $ssh->_read_version; ($ssh->{server_version_string} = $remote_id) =~ s/\cM?\n$//; my($remote_major, $remote_minor, $remote_version) = $remote_id =~ /^SSH-(\d+)\.(\d+)-([^\n]+)\n$/;
On Tue Aug 03 18:06:44 2010, https://www.google.com/accounts/o8/id?id=AItOawlTPC4PsAjNGGpeOWlk2TsuHf1oEtld9LQ wrote: Show quoted text
> On Thu Mar 04 13:25:15 2010, ecarlseen wrote: > this is the diff which fixed > it for me in 1.34 (of Net/SSH/Perl.pm). It essentially just reverts the > relevant code to the 1.30 read one byte at a time version.
Hi! I just added a branch containing this patch here: https://github.com/renormalist/Net-SSH-Perl/tree/rt55195-race-condition-in-kexinit but would like to hear more opinions from the original authors whether that patch is ok to apply. It passes all tests, but I don't really know the code to recognize possible implications. Kind regards, Steffen -- Steffen Schwigon <ss5@renormalist.net> Dresden Perl Mongers <http://dresden-pm.org/>
On Mon Dec 03 11:30:03 2012, SCHWIGON wrote: Show quoted text
> I just added a branch containing this patch here: > > > https://github.com/renormalist/Net-SSH-Perl/tree/rt55195-race- > condition-in-kexinit
Now part of v1.36, just uploaded to CPAN. Steffen -- Steffen Schwigon <ss5@renormalist.net> Dresden Perl Mongers <http://dresden-pm.org/>