Bug #59332 for Net-SSH2: Handling LIBSSH2_ERROR

Tue Jul 13 18:48:54 2010 dwheeler [...] cpan.org - Ticket created

CC:	libssh2 development <libssh2-devel [...] cool.haxx.se>
Subject:	Handling LIBSSH2_ERROR_EAGAIN
Date:	Tue, 13 Jul 2010 15:48:44 -0700
To:	bug-net-ssh2 [...] rt.cpan.org
From:	"David E. Wheeler" <dwheeler [...] cpan.org>

Dear Net::SSH2 Maintainers, With help from the deniznes of #libssh2 on Freenode (Bagder and CareBear\), I got a bug fixed in Net::SSH2's scp_put() method. The bug in two parts, as far as I can tell. The first, in SSH2.xs, is that it doesn't properly handle LIBSSH2_ERROR_EAGAIN spinning in net_ch_write(). The patch is here: http://github.com/theory/net-ssh2/commit/9c9ced9984ce7891a1a001b84c846d2425137049 The second, in the scp_put() method of Net::SSH2, is that it didn't handle partial writes. So assuming a failure on a partial write, it was failing with whatever error was left in libssh2, which happened to be LIBSSH2_ERROR_EAGAIN. So this patch checks for partial writes and does the right thing: http://github.com/theory/net-ssh2/commit/d053f28a35d1f0454d2155145d54c600a2f41447 It looks like there might be similar bugs throughout Net::SSH2. In particular, I'm told that, when blocking is enabled (as it is by default), Net::SSH2 should ignore LIBSSH2_ERROR_EAGAIN. In many places, it's either failing to account for partial writes (or, perhaps, reads, as in scp_get()), and thinks there is an error because it finds LIBSSH2_ERROR_EAGAIN. So there may be a bunch of places it fails. Another, I've noticed, is when connecting to some hosts with: my $ssh2 = Net::SSH2->new; $ssh2->connect('test.example.com'); my $ret = $ssh2->auth( username => 'theory', password => '*****', publickey => "$ENV{HOME}id_rsa.pub", privatekey => "$ENV{HOME}id_rsa", ); die $ssh2->error unless $ret && $ssh2->auth_ok; So I think there might need to be some refactoring with regard to handling LIBSSH2_ERROR_EAGAIN when blocking is enabled and with regard to partial reads and writes. Thanks, David

Tue Jul 13 19:32:34 2010 peter [...] stuge.se - Correspondence added

Subject:	Re: [rt.cpan.org #59332] AutoReply: Handling LIBSSH2_ERROR_EAGAIN
Date:	Wed, 14 Jul 2010 01:32:20 +0200
To:	Bugs in Net-SSH2 via RT <bug-Net-SSH2 [...] rt.cpan.org>
From:	Peter Stuge <peter [...] stuge.se>

It seems there's a bit of misunderstanding about the _EAGAIN error code. I'll try to clarify. Bugs in Net-SSH2 via RT wrote: Show quoted text

> The bug in two parts, as far as I can tell.

It's two distinct bugs, actually. Show quoted text

> The first, in SSH2.xs, is that it doesn't properly handle > LIBSSH2_ERROR_EAGAIN spinning in net_ch_write(). The patch is here: > > http://github.com/theory/net-ssh2/commit/9c9ced9984ce7891a1a001b84c846d2425137049

Although I am the author, this is not neccessarily the appropriate fix. I can't say what is best because I know little about perl and less about Net:SSH2. libssh2 offers an API which can optionally be used completely non-blocking. When non-blocking mode is enabled, any libssh2 function call that would block will instead return LIBSSH2_ERROR_EAGAIN, and the same function should be called again. Just like other non-blocking calls. A blocking mode can also be enabled, in this case there are internal wrappers in libssh2 for all functions which spin efficiently if _EAGAIN should occur within the library. The libssh2 error code is set proactively in many places within the library and if one tries to read the last error code when in fact there was no error, then it is very likely that _EAGAIN will be returned. It is important that the libssh2 caller does not check for an error code unless there was in fact an error condition. How to deal with this first patch depends on how Net:SSH2 wants to deal with the blocking/non-blocking issue, and what the "perl way" is. The patch makes Net:SSH2 behave in a blocking mode, and does so a little inefficiently. I recommend reworking this patch before a release. Show quoted text

> The second, in the scp_put() method of Net::SSH2, is that it didn't > handle partial writes.

This patch also needs consideration. Again the patch makes Net:SSH2 behave in a blocking fashion, it will loop calling libssh2 until all data is sent. This is not neccessarily the best way to implement Net:SSH2 and it may not be the "perl way" - I have no idea, I just fixed this one particular case for one particular application which makes assumptions about Net:SSH2 which I know to be neither true nor false. Show quoted text

> So assuming a failure on a partial write,

This is the problem. libssh2 can return fewer bytes than requested both when reading and writing, including 0, and this must not be considered an error. This means that libssh2 has processed some of the data, but not all. The caller must deal with this. As the man pages describe, return codes are always negative if there was an error. Anything equal to, or greater than, 0 is a successful call to libssh2. Net:SSH2 should decide how to expose this property of libssh2 to callers. Show quoted text

> it was failing with whatever error was left in libssh2, which > happened to be LIBSSH2_ERROR_EAGAIN.

The code assumed that the only successful case was when libssh2 took care of *all* requested data, and assumed that everything else was an error. This is wrong. That's why there was a nonsense error code. (See above, the error code is set internally in many places where it *might* be returned, if libssh2 is being used in non-blocking mode.) Show quoted text

> So this patch checks for partial writes and does the right thing: > > http://github.com/theory/net-ssh2/commit/d053f28a35d1f0454d2155145d54c600a2f41447

It is one solution, but again it is not neccessarily the appropriate one. Net:SSH2 needs to decide how to deal with the possibility of libssh2 to offer a non-blocking API. If that should be hidden, and Net:SSH2 users must always expect and receive a fully blocking API (simpler, but less flexible) then this patch is correct. If instead Net:SSH2 users should have an option to use libssh2 in non-blocking mode, then the patch needs to be reworked, to pass the _EAGAIN "error code" up to the caller, so that the caller can decide what/when to perform the next action. Show quoted text

> It looks like there might be similar bugs throughout Net::SSH2.

This is quite likely, since the bug indicates a misunderstanding or misinterpretation of the libssh2 API, so it's probably present throughout the binding. Show quoted text

> In particular, I'm told that, when blocking is enabled (as it is > by default), Net::SSH2 should ignore LIBSSH2_ERROR_EAGAIN.

Not ignore, but spin on. Call the same function again with the same parameters. _EAGAIN is returned as if it was an error code (it's a negative number) but in fact it's not really an error - it is just a way for libssh2 to clearly indicate that this API call would block, and the application doesn't want that to happen. Show quoted text

> In many places, it's either failing to account for partial writes > (or, perhaps, reads, as in scp_get()), and thinks there is an error > because it finds LIBSSH2_ERROR_EAGAIN.

It thinks that there is an error because not all bytes were handled by libssh2. This is a false assumption. Anything to do with _EAGAIN in this situation comes as a consequence of that false assumption. Show quoted text

> So there may be a bunch of places it fails. Another, I've noticed, > is when connecting to some hosts with:

This is basically true for every single function in the libssh2 API. Show quoted text

> So I think there might need to be some refactoring with regard to > handling LIBSSH2_ERROR_EAGAIN when blocking is enabled and with > regard to partial reads and writes.

Yes, Net:SSH2 basically needs to be fixed to use the libssh2 API correctly. //Peter

Tue Jul 13 19:32:35 2010 The RT System itself - Status changed from 'new' to 'open'

Wed Jul 28 19:51:55 2010 dwheeler [...] cpan.org - Correspondence added

RT-Send-CC:

schwern [...] pobox.com

Hey Caelum, Any thoughts on how to address this issue? Thanks! David

Fri Jul 30 11:09:38 2010 rkitover@cpan.org - Correspondence added

Subject:	Re: [rt.cpan.org #59332] Handling LIBSSH2_ERROR_EAGAIN
Date:	Fri, 30 Jul 2010 11:09:41 -0400
To:	David Wheeler via RT <bug-Net-SSH2 [...] rt.cpan.org>
From:	Rafael Kitover <rkitover [...] cpan.org>

On Wed, Jul 28, 2010 at 07:51:55PM -0400, David Wheeler via RT wrote: Show quoted text

> Queue: Net-SSH2 > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=59332 > > > Hey Caelum, > > Any thoughts on how to address this issue? > > Thanks! > > David

I'm probably going to add the eagain spin hacks so that they are only activated with ->blocking(1) and add the ->disconnect on DESTROY in the next release. You're always disconnecting from irc so I can't leave you messages :)

Fri Jul 30 13:01:33 2010 dwheeler [...] cpan.org - Correspondence added

CC:	Peter Stuge <peter [...] stuge.se>
Subject:	Re: [rt.cpan.org #59332] Handling LIBSSH2_ERROR_EAGAIN
Date:	Fri, 30 Jul 2010 10:01:23 -0700
To:	bug-Net-SSH2 [...] rt.cpan.org
From:	"David E. Wheeler" <dwheeler [...] cpan.org>

On Jul 30, 2010, at 8:09 AM, rkitover@cpan.org via RT wrote: Show quoted text

> <URL: http://rt.cpan.org/Ticket/Display.html?id=59332 > > > On Wed, Jul 28, 2010 at 07:51:55PM -0400, David Wheeler via RT wrote:

>> Queue: Net-SSH2 >> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=59332 > >> >> Hey Caelum, >> >> Any thoughts on how to address this issue? >> >> Thanks! >> >> David

> > I'm probably going to add the eagain spin hacks so that they are only > activated with ->blocking(1) and add the ->disconnect on DESTROY in the > next release.

Peter, does that sound right to you? Show quoted text

> You're always disconnecting from irc so I can't leave you messages :)

Yeah, time zone issues, I guess. You can always leave a message for me with purl: theory:purl, message for Caelum S’up? purl:Message for caelum stored. Best, David

Mon Aug 23 08:07:00 2010 sisyphus [...] cpan.org - Correspondence added

On Tue Jul 13 18:48:54 2010, DWHEELER wrote: Show quoted text

> The second, in the scp_put() method of Net::SSH2, is that it didn't > handle partial writes. So assuming a failure on a partial write, it > was failing with whatever error was left in libssh2, which happened > to be LIBSSH2_ERROR_EAGAIN. So this patch checks for partial writes > and does the right thing: > > http://github.com/theory/net- > ssh2/commit/d053f28a35d1f0454d2155145d54c600a2f41447

This patch solved a long-running problem I've had trying to upload ~7MB binary files via scp_put() from my Windows Vista box to a linux box over a sluggish internet connection. Over a fast connection (from the same Windows Vista box) to a linux box on my local network, there was never any problem. (Btw, I've as yet found no need to apply the first patch that David linked to.) The problem was simply that the upload would bail out (no error messages) at some random point - usually before 0.5 MB had been uploaded, and it was bailing out because the condition: $chan->write($buf) == $count was not true. The fact that the error was going unreported led me to also rewrite the error handling. The patch (against version 0.33 of SSH2.pm) that I've used is: ######################### --- C:\perl5121_M/site/lib/Net/SSH2.pm_orig Mon Aug 23 20:06:32 2010 +++ C:\perl5121_M/site/lib/Net/SSH2.pm Mon Aug 23 20:36:30 2010 @@ -373,8 +373,18 @@ $self->error(0, "want $block, have $count"), return unless $count == $block; die 'sysread mismatch' unless length $buf == $count; - $self->error(0, "error writing $count bytes to channel"), return - unless $chan->write($buf) == $count; + my $wrote = 0; + while ($wrote >= 0 && $wrote < $count) { + my $wr = $chan->write($buf); + last if $wr < 0; + $wrote += $wr; + $buf = substr $buf, $wr; + } + unless($wrote == $count) { + my @error = $self->error(); + warn "Error writing $count bytes to channel: @error\n"; + return; + } } # send/receive SCP acknowledgement ######################### Given that the code $self->error(0, "error writing $count bytes to channel"), return unless $chan->write($buf) == $count; was failing to produce any error message when it should have, I also wonder if $self->error($!, $!), return unless defined $count; $self->error(0, "want $block, have $count"), return unless $count == $block; (from a few lines earlier) would likewise fail to produce error messages. Finally, the libssh2 documentation states that libssh2_scp_send_ex() is deprecated, and should be replaced with libssh2_scp_send64(), yet SSH2.xs still uses the former. Thanks, guys !! Cheers, Rob

Mon Aug 23 18:16:52 2010 sisyphus [...] cpan.org - Correspondence added

On Mon Aug 23 08:07:00 2010, SISYPHUS wrote: Show quoted text

> + my $wrote = 0; > + while ($wrote >= 0 && $wrote < $count) { > + my $wr = $chan->write($buf); > + last if $wr < 0; > + $wrote += $wr; > + $buf = substr $buf, $wr; > + }

Is there a possibility that the while loop could loop forever ? Should there be a mechanism that breaks the loop after a set number of iterations ? Something like + my ($wrote, $safety, $max) = (0, 0, 100); + while ($wrote >= 0 && $wrote < $count) { + $safety++; + last if $safety > $max; + my $wr = $chan->write($buf); + last if $wr < 0; + $wrote += $wr; + $buf = substr $buf, $wr; + } What would be a reasonable value for $max ? Cheers, Rob

Wed Aug 26 09:36:42 2015 salva [...] cpan.org - Correspondence added

write has been fixed to handle buffers of any size correctly in the development version. The bugs referenced in this report should all be gone now.

Wed Aug 26 09:36:50 2015 salva [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #59332 for Net-SSH2: Handling LIBSSH2_ERROR_EAGAIN