Subject: | Max Requests per host/IP patch |
Moved from #69514. This implements the following:
* Implemented a new "max_requests" parameter, defaulted at 3, which
controls the rate that an individual host is processing requests at the
same time.
This required the following changes:
* A new $event array variable called _HOSTNAME (detailed in another
patch; this patch assumes it's already in place)
* Changes to register/deregister to keep track of hostnames in a very
similar fashion to the descriptor object.
* Code at the top of _event_handle to skip over events that are past the
max request limit until another request has finished
* An auto-change of the max_requests to 1 if the host times out once,
since it obviously cannot handle the requests it already had.
* Changing the name of the unused "return_response_pdu" to
"send_pdu_priority", and using that throughout SNMP.pm for _existing_
requests. This way, existing requests for, say, get_tables are sent
immediately through the pipe and only the receive timers get put into
the event list. This makes existing requests immune to the max_request
limits (and post-select lag), and ensures that the host is not waiting
too long for our reply for more information.
* A new parameter (plus help) in both SNMP.pm and Net::SNMP::Transport
This patch, along with the receive buffer patch, fixes both ends of the
large request problem.
The receive buffer patch fixes the one-to-many IPs problem. In other
words, if a single client (us via Net::SNMP) is sending many requests to
different hosts, it can be assumed that those hosts are going to
collectively process those requests and send it out faster than the one
client can process all of the return packets. It's like a 75-core
server processing everything and sending it back to the single-core
client, which was, before the patch, continuing to overload itself.
This patch fixes the one-to-many _requests_ problem. In other words, a
single client is sending a host many different requests, and forcing the
host to process all of them at the same time. To the client, sending a
request for 20 large tables is easy. Actually getting the data is a lot
harder. Depending on how smart or dumb the host's SNMP software is, it
may be trying to process all 20 requests at the same time. This results
in timeouts, as it never gets to send any one packet in time. (In fact,
I end up seeing late packets that get rejected because the msgID has
already been thrown away.) The retries don't work at all, because all
20 requests time out at the same time, and the code just sends the same
20 requests with the same time frame. Rinse and repeat until the retry
limit is reached, and you end up with an angry server and no data to
show for it. This is a problem even if you were just sending to a
single host, so it's not just for large multi-host requests.
So, this patch keeps it at a reasonable 3 requests per host. Existing
requests still get processed as normal, but new ones are curbed until
one of the other requests have finished. Yes, 3 is somewhat of an
arbitrary limit, but:
1. It's reasonable to assume that most hosts probably can't (and
shouldn't) handle more than 3 table pull requests at a time.
2. It's adjustable per host by the user.
3. It has the potential to be replaced with an auto-threshold that
adjusts this limit according to the response rate of the host, thus
eliminating the arbitrary number.
Subject: | MaxRequests.patch.txt |
Message body is not shown because it is too large.