Skip Menu |

This queue is for tickets about the Net-SSH-Perl CPAN distribution.

Report information
The Basics
Id: 7910
Status: open
Priority: 0/
Queue: Net-SSH-Perl

People
Owner: Nobody in particular
Requestors: bwalz [...] paradigm-healthcare.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.25
Fixed in: (no value)



Subject: $shh("cat -> file",$data) hangs on large amount of data
This probalbly is not a bug but a problem with my older version of openssh on the client machine but: $ssh->cmd("cat ->/tmp/file.tmp", $alot_of_data); will cause the process to hang when dumping more than say 6K of data. The following code sample from pscp fixes the problem (file contents in $c): # chop up file, seems to lock up on large chunks my @chunks = $c =~ /.{1,6144}/gs; my $f = 0; foreach my $chunk (@chunks) { my $cmd = $f++?"cat - >>$tfile":"cat - >$tfile"; my($out, $err, $exit) = $ssh->cmd($cmd, $chunk); die "Can't write file $tfile: $err" if $err; } Debug ouput including version numbers: : Reading configuration data /home/bill/.ssh/config : Reading configuration data /etc/ssh_config : Connecting to xxxx.xxxx.com, port 22. : Remote protocol version 1.99, remote software version OpenSSH_3.1p1 : Net::SSH::Perl Version 1.25, protocol version 2.0. : No compat match: OpenSSH_3.1p1. : Connection established.
[guest - Wed Oct 6 17:41:46 2004]: Show quoted text
> This probalbly is not a bug but a problem with my older version of > openssh on the client machine but: > > $ssh->cmd("cat ->/tmp/file.tmp", $alot_of_data); > > will cause the process to hang when dumping more than say 6K of data.
I've found the bug, and have fixed it as described below. I don't think this is the best way to fix it, but it did prove my theory on how the bug was occuring. The problem occurs due to the interaction between drain_outgoing (in Channel.pm) and client_loop in (SSH2.pm). If your $stdout is bigger than remote_maxpacket, drain_outgoing tries to repeatedly call client_loop until the length is reduced to zero. The problem is that client_loop will not return back to drain_outgoing, once it is called from inside the while loop the second time. MY FIX: I reasoned that if I could force the client_loop to execute one-and-only-one-time, for each time it was called from the drain_outgoing while loop, there would be no problem. To test this, I did the following: 1.) drain_outgoing: added the following line before the while loop $c->{ssh}->{DoOneLoop}=1; 2.) drain_outgoing: added the following line after the while loop undef($c->{ssh}->{DoOneLoop}); 3.) client_loop: added the following line just before the end of the first while loop last if($ssh->{DoOneLoop}); This seems to have fixed it for all scenarios that I've tested. If you know of a better solution, or know how to get this into the official distribution, please email me: craig_at_lucent_dot_com
From: David Robins <dbrobins [...] davidrobins.net>
To: bug-Net-SSH-Perl [...] rt.cpan.org
Subject: Re: [cpan #7910] $shh("cat -> file",$data) hangs on large amount of data
Date: Mon, 15 Aug 2005 19:32:29 -0700
CC: craig [...] lucent.com
RT-Send-Cc:
On Monday August 15, 2005 07:14, Guest via RT wrote: Show quoted text
> Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=7910 >
... Show quoted text
> If you know of a better solution, or know how to get this > into the official distribution, please email me: > craig_at_lucent_dot_com
Patches can be submitted to the list or to me (DBROBINS at cpan.org) directly. Please include: - a summary of the problem being fixed (test cases, what goes wrong, what you expect to happen) - a gzipped unified diff of the changed file(s) I'll review it, and either apply it (possibly with changes) or get back to you with suggested changes. Thanks, -- Dave Isa. 40:31
From: jgilbert
[guest - Mon Aug 15 10:14:33 2005]: Show quoted text
> I've found the bug, and have fixed it as described below. I don't > think this is the best way to fix it, but it did prove my theory > on how the bug was occuring.
[excellent fix snip] Show quoted text
> > This seems to have fixed it for all scenarios that I've tested.
There are two scenarios where this is isn't totally fixed: 1. when the remote peer is running OpenSSH 3.7.1p3 2. when the remote peer is running Sun's deployed SSH identified by 'SSH-1.99-Sun_SSH_1.1' In these cases, with the DoOneLoop fix, the select() call inside client_loop hangs indefinitely with a $stdin to cmd() that is greater than 32768B. More interestingly, it appears that data on STDIN is put onto the resulting channel which is then put into the fd on the remote peer. We've implemented the above, with the following additional changes in SSH2.pm, Constants.pm, and Channel.pm: diff /opt/rcs/lib/Net/SSH/Perl/Channel.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/Channel.pm 191a192,193 Show quoted text
> ## COVD FIX: > $c->{ssh}->{DoOneLoop} = 10;
195a198,200 Show quoted text
> ## COVD FIX: > undef( $c->{ssh}->{DoOneLoop} ); > delete $c->{ssh}->{DoOneLoop};
diff /opt/rcs/lib/Net/SSH/Perl/Constants.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/Constants.pm 138c138 < 'MAX_PACKET_SIZE' => 256000, --- Show quoted text
> 'MAX_PACKET_SIZE' => 8192,
diff /opt/rcs/lib/Net/SSH/Perl/SSH2.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/SSH2.pm 302c303,306 < my($rready, $wready) = $select_class->select($rb, $wb); --- Show quoted text
> ## COVD FIX: > $ssh->debug("Instantiating a select with $ssh->{DoOneLoop}
second timeout.") Show quoted text
> if (exists $ssh->{DoOneLoop} && $ssh->{DoOneLoop} > 0); > my($rready, $wready) = $select_class->select($rb, $wb, undef,
($ssh->{DoOneLoop} || undef)); 313a318,319 Show quoted text
> ## COVD FIX: > last if ($ssh->{DoOneLoop});
This workaround appears to function correctly with any filesize for SSH implementations identifing themselves as SSH-1.99-OpenSSH_3.7.1p2, SSH-1.99-Sun_SSH_1.0.1, and SSH-1.99-Sun_SSH_1.1. I'm pretty sure this shouldn't make it into an official patch; I'm just documenting this for anyone else bitten by this. ;) -jgilbert.
From: Craig
Folks- Not only is this bug appearing on many different platforms, but the fix I proposed earlier (and the subsequent jgilbert fix) doesn't work on some of them with v1.30 (ppc-linux in particular). I've written a perl script to cause the bug to occur, and identify just how many characters your system needs to trigger it. Can you try running it and see if you have the bug and post your results here? I've also sent this to Dave Robins to see if he can figure this but out once and for all. My output is below. I'm attaching the perl test script. Thanks -Craig Net::SSH::Perl bug test... host=nwsgpb user=watchmrk remoteCmd=cat xferChars=932779|-Error, down to 466390 chars Received disconnect message: Corrupted MAC on input. xferChars=466390|-Error, down to 233195 chars Received disconnect message: Corrupted MAC on input. xferChars=233195|-Error, down to 116598 chars Received disconnect message: Corrupted MAC on input. xferChars=116598|-Error, down to 58299 chars Received disconnect message: Corrupted MAC on input. xferChars=58299|-Error, down to 29150 chars alarm handler 10 second timeout xferChars=29150|-Error, down to 14575 chars alarm handler 10 second timeout xferChars=14575|-Error, down to 7288 chars alarm handler 10 second timeout xferChars=7288|-OK, back up to 10931 chars xferChars=10931|-OK, back up to 12753 chars xferChars=12753|-Error, down to 11842 chars alarm handler 10 second timeout xferChars=11842|-Error, down to 11387 chars alarm handler 10 second timeout xferChars=11387|-OK, back up to 11614 chars xferChars=11614|-Error, down to 11501 chars alarm handler 10 second timeout xferChars=11501|-OK, back up to 11557 chars xferChars=11557|-Error, down to 11529 chars Received disconnect message: Corrupted MAC on input. xferChars=11529|-OK, back up to 11543 chars xferChars=11543|-Error, down to 11536 chars Received disconnect message: Corrupted MAC on input. xferChars=11536|-Error, down to 11533 chars alarm handler 10 second timeout xferChars=11533|-OK, back up to 11534 chars xferChars=11534|-OK, back up to 11535 chars xferChars=11535|-OK, back up to 11535 chars Report for remote host: nwsgpb... xferChars=11535 works, xferChars=11536 hangs on timeout=10. exiting...
#!/opt/exp/bin/perl #!/usr/bin/perl use strict; use warnings; use English; use Data::Dumper; # Set these for your remote host... my $user = "user"; my $password = "password"; my $host = "machine"; # Time to wait for remote command to finish... my $timeout=10; # Consider upping this if you change $high # Remote command... my $cmd = "cat"; # Print heading... print "Net::SSH::Perl bug test...\n"; print "host=$host user=$user remoteCmd=$cmd\n"; # Set variables... my $high=932779; # Maximum number of starting xmitChars my $low=0; my $i=$high; my $delta=$high-$low;; my $ssh; # Main loop... while($delta>1) { undef($ssh); print "xferChars=$i"; # setup SSH... $ssh = newssh() || die "Can't create a new ssh, exiting..."; print '|'; # Indicates newssh worked... $ssh->login($user,$password) || die "Can't login, exiting..."; print '-'; # Indicates successful login... # Proper values of $i will trigger the bug... my $stdin = '=' x $i; # Run remote command... my ($stdout, $stderr, $exit) = runcmd($ssh, $cmd, $stdin, $timeout); # Check output... if(defined($stdout)) { if(length($stdout) != $i) { die "Data mismatch, expected: $i chars, got: ", length($stdout), "chars\n"; } # Go back up... $delta = $high-$i; $low = $i; $i += int($delta/2); print "OK, back up to $i chars\n" }else{ # Go down... $delta = $i-$low; $high = $i; $i -= int($delta/2); print "Error, down to $i chars\n\t$stderr\n"; next; } } if ($i == $high) { print "\nBug did not trigger with $high chars on this system.\n"; }else{ print "\nReport for remote host: $host...\n"; print "xferChars=$low works, xferChars=$high hangs on timeout=$timeout.\n"; } print "exiting...\n"; sub newssh { my $ssh; # Require the correct SSH package, based on OS... if($OSNAME=~/^(MSWin32)/) { require Net::SSH::W32Perl; $ssh = Net::SSH::W32Perl->new( $host, port => 22 , protocol => 2, #debug => 999, ); }else{ require Net::SSH::Perl; $ssh = Net::SSH::Perl->new( $host, port => 22 , protocol => 2, #debug => 999, ); } return($ssh); } sub runcmd { my $ssh = shift || die "Missing ssh object"; my $cmd = shift || die "Missing cmd"; my $stdin = shift || die "Missing stdin"; my $timeout = shift || die "Missing timeout"; my ($stdout, $stderr, $exit); eval { local $SIG{'ALRM'} = sub { die "alarm handler $timeout second timeout"; }; alarm($timeout); ($stdout, $stderr, $exit) = $ssh->cmd($cmd, $stdin); alarm(0); }; if($@) { $stderr=$@; $stderr=~s/( at |\n).*$/ /s; } return ($stdout, $stderr, $exit); }
Looks like a dupe of 4559.
From: craig [...] alcatel-lucent.com
This ticket needs to be re-opened, as I've encountered it again on version 1.34. I've verified both the bug, and the DoOneLoop fix, under windows (XP), MacOSX, and Solaris. However, the included detection script (as-is) does not catch the bug anymore. If you modify the high value to 98304 or below, it will catch the bug when you connect to a solaris box running SSH-2.0- Sun_SSH_1.1.3 (and possibly others).