It seems there's a bit of misunderstanding about the _EAGAIN error
code. I'll try to clarify.
Bugs in Net-SSH2 via RT wrote:
Show quoted text> The bug in two parts, as far as I can tell.
It's two distinct bugs, actually.
Show quoted text
Although I am the author, this is not neccessarily the appropriate
fix. I can't say what is best because I know little about perl and
less about Net:SSH2.
libssh2 offers an API which can optionally be used completely
non-blocking. When non-blocking mode is enabled, any libssh2 function
call that would block will instead return LIBSSH2_ERROR_EAGAIN, and
the same function should be called again. Just like other
non-blocking calls.
A blocking mode can also be enabled, in this case there are internal
wrappers in libssh2 for all functions which spin efficiently if
_EAGAIN should occur within the library.
The libssh2 error code is set proactively in many places within the
library and if one tries to read the last error code when in fact
there was no error, then it is very likely that _EAGAIN will be
returned. It is important that the libssh2 caller does not check for
an error code unless there was in fact an error condition.
How to deal with this first patch depends on how Net:SSH2 wants to
deal with the blocking/non-blocking issue, and what the "perl way"
is. The patch makes Net:SSH2 behave in a blocking mode, and does so a
little inefficiently. I recommend reworking this patch before a
release.
Show quoted text> The second, in the scp_put() method of Net::SSH2, is that it didn't
> handle partial writes.
This patch also needs consideration. Again the patch makes Net:SSH2
behave in a blocking fashion, it will loop calling libssh2 until all
data is sent. This is not neccessarily the best way to implement
Net:SSH2 and it may not be the "perl way" - I have no idea, I just
fixed this one particular case for one particular application which
makes assumptions about Net:SSH2 which I know to be neither true nor
false.
Show quoted text> So assuming a failure on a partial write,
This is the problem. libssh2 can return fewer bytes than requested
both when reading and writing, including 0, and this must not be
considered an error. This means that libssh2 has processed some of
the data, but not all. The caller must deal with this. As the man
pages describe, return codes are always negative if there was an
error. Anything equal to, or greater than, 0 is a successful call to
libssh2. Net:SSH2 should decide how to expose this property of
libssh2 to callers.
Show quoted text> it was failing with whatever error was left in libssh2, which
> happened to be LIBSSH2_ERROR_EAGAIN.
The code assumed that the only successful case was when libssh2 took
care of *all* requested data, and assumed that everything else was an
error. This is wrong. That's why there was a nonsense error code.
(See above, the error code is set internally in many places where it
*might* be returned, if libssh2 is being used in non-blocking mode.)
Show quoted text
It is one solution, but again it is not neccessarily the appropriate
one. Net:SSH2 needs to decide how to deal with the possibility of
libssh2 to offer a non-blocking API. If that should be hidden, and
Net:SSH2 users must always expect and receive a fully blocking API
(simpler, but less flexible) then this patch is correct.
If instead Net:SSH2 users should have an option to use libssh2 in
non-blocking mode, then the patch needs to be reworked, to pass the
_EAGAIN "error code" up to the caller, so that the caller can decide
what/when to perform the next action.
Show quoted text> It looks like there might be similar bugs throughout Net::SSH2.
This is quite likely, since the bug indicates a misunderstanding or
misinterpretation of the libssh2 API, so it's probably present
throughout the binding.
Show quoted text> In particular, I'm told that, when blocking is enabled (as it is
> by default), Net::SSH2 should ignore LIBSSH2_ERROR_EAGAIN.
Not ignore, but spin on. Call the same function again with the same
parameters. _EAGAIN is returned as if it was an error code (it's a
negative number) but in fact it's not really an error - it is just a
way for libssh2 to clearly indicate that this API call would block,
and the application doesn't want that to happen.
Show quoted text> In many places, it's either failing to account for partial writes
> (or, perhaps, reads, as in scp_get()), and thinks there is an error
> because it finds LIBSSH2_ERROR_EAGAIN.
It thinks that there is an error because not all bytes were handled
by libssh2. This is a false assumption. Anything to do with _EAGAIN
in this situation comes as a consequence of that false assumption.
Show quoted text> So there may be a bunch of places it fails. Another, I've noticed,
> is when connecting to some hosts with:
This is basically true for every single function in the libssh2 API.
Show quoted text> So I think there might need to be some refactoring with regard to
> handling LIBSSH2_ERROR_EAGAIN when blocking is enabled and with
> regard to partial reads and writes.
Yes, Net:SSH2 basically needs to be fixed to use the libssh2 API
correctly.
//Peter