Subject: | Unresponsive SNMP agents cause PoCo::SNMP to leak memory |
I'm debuggin an SNMP collector I wrote that queries several dozen devices for a rather large number of OIDs. On some installations (Fedora, Perl 5.18.2; CentOS 5, Perl 5.8.8 and CentOS 6, Perl 5.10.1) I noticed the process had grown to an obscene size after a day while on others it chugged along happily at roughly its initial size. The problem turned out to be a HA installation where of two firewalls only the currently active one responds to SNMP while the other one keeps quiet.
The attached code shows the problem in isolation using Devel::Gladiator. It sends one query a second, and once every ten seconds it prints its current process size (Linux specific but easily disabled) and an overview of all objects whose number has changed since the last time the stats were shown. If you point $host to a device that responds to a sysUptime query, after a while there will be no changes in the number of objects allocated by Perl. If the device doesn't repond however, you gets something like this:
Abs Count Difference Data Type
36064 317 SCALAR
1706 104 REF
3610 72 ARRAY
259 32 REF-ARRAY
1956 16 CODE
518 16 HASH
32 8 Net::SNMP::PDU
32 8 POE::Session::AnonEvent
136 16 REF-CODE
135 16 REF-HASH
34 8 REF-Net::SNMP::PDU
34 8 REF-Net::SNMP::Security::Community
35 8 REF-Net::SNMP::Transport::IPv4::UDP
33 8 REF-POE::Net::SNMP
32 8 REF-POE::Session::AnonEvent
The module seems to be leaking PDUs that are either not correctly timed out or not properly cleaned up afterwards due to some circular references. It's definitely the POE part because an equivalent program sing plain Net::SNMP does not show this behavior even though Devel::Cycle shows circular structures like this:
$Net::SNMP::A->{'_pdu'} => \%Net::SNMP::PDU::DH
$Net::SNMP::PDU::DH->{'_callback'} => \&DI
$DI variable $this => \$DJ
$$DJ => \%Net::SNMP::A
The same cycle is visible in the POE version:
$POE::Kernel::A->[5] => \@POE::Queue::Array::IG
$POE::Queue::Array::IG->[447] => \@AOD
$AOD->[2] => \@AOE
$AOE->[0] => \@POE::Session::HC
$POE::Session::HC->[0] => \%HD
$HD->{'snmp_session'} => \%POE::Net::SNMP::HE
$POE::Net::SNMP::HE->{'_pdu'} => \%Net::SNMP::PDU::HF
$Net::SNMP::PDU::HF->{'_callback'} => \&HG
$HG variable $this => \$HH
$$HH => \%POE::Net::SNMP::HE
My guess is that the Net::SNMP dispatcher upon timeout actively breaks this cycle but I haven't had time to track it down.
Subject: | poe-snmptest.pl |
#!/usr/bin/perl
use strict;
use warnings;
use POE qw/ Component::SNMP /;
use Data::Dumper;
use Devel::Gladiator qw/ arena_ref_counts /;
my $host = 'localhost';
my $testoid = '.1.3.6.1.2.1.1.3.0';
my $count = 0;
POE::Session->create(
inline_states => {
snmp_response => sub { my $r=$_[ARG1]->[0]; print ref $r ? $r->{$testoid} : $r, "\n"; },
snmp_query => \&_snmp_query,
_start => \&_start,
debug_meminfo => \&_debug_meminfo,
},
);
POE::Kernel->run;
sub _start {
my ($kernel, $heap) = @_[KERNEL, HEAP];
POE::Component::SNMP->create(
hostname => $host,
retries => 0,
debug => 0,
);
$kernel->yield('snmp_query');
}
sub _snmp_query {
my ($kernel) = @_[KERNEL, ARG0];
$kernel->post('snmp', 'get', 'snmp_response', -varbindlist => [ $testoid ] );
$kernel->delay("snmp_query", 1);
_debug_meminfo($kernel) unless $count = ($count+1) % 10;
}
my %arena_stats;
sub _debug_meminfo {
my ($kernel) = @_;
# Determine our RSS
open my $fh, '<', "/proc/$$/status" or do {
$kernel->post('logger', 'info', "Can't open /proc/$$/status: $!");
return;
};
my $rss;
while(<$fh>) { ($rss) = /^VmRSS:\s*(\d+)/ and last }
if(defined $rss) {
print "Current RSS size: $rss KB; event queue: ".$kernel->[5]->get_item_count." elements\n";
} else {
print "Can't find RSS size in /proc/$$/status; event queue: ".$kernel->[5]->get_item_count." elements\n";
}
close $fh;
# Print object counts in the arena that have changed
my $ar = arena_ref_counts;
my %stats;
while(my ($k, $v) = each %$ar) {
my $diff = $v - ($arena_stats{$k} || 0);
$stats{$k} = $diff if $diff;
$arena_stats{$k} = $v;
}
printf "%12s %12s %s\n", "Abs Count", "Difference", "Data Type";
foreach(qw/ SCALAR REF ARRAY REF-ARRAY GLOB CODE /) {
printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_ if $stats{$_};
delete $stats{$_};
}
foreach(sort keys %stats) {
printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_;
}
}
Subject: | snmptest.pl |
#!/usr/bin/perl
use strict;
use warnings;
use Net::SNMP;
use Data::Dumper;
use Devel::Gladiator qw/ arena_ref_counts /;
use Devel::Cycle;
use Time::HiRes 'sleep';
my $host = 'localhost';
my $testoid = '.1.3.6.1.2.1.1.3.0';
my $cycl0r=0;
my ($sess, $err) = Net::SNMP->session(
-hostname => $host,
-nonblocking => 1,
-debug => 1,
);
die $err if $err;
while(1) {
next_request();
$sess->snmp_dispatcher;
print "restart\n";
find_cycle($sess);
debug_meminfo();
}
sub next_request {
sleep(0.5);
unless($cycl0r = ($cycl0r + 1) % 10) {
find_cycle($sess);
debug_meminfo();
}
$sess->get_request(-callback => \&result_cb, -varbindlist => [ $testoid ]) or print "req fail\n";
}
sub result_cb {
my $o = shift;
print "result: ",$o->var_bind_list->{$testoid},"\n";
next_request();
}
my %arena_stats;
sub debug_meminfo {
# Print object counts in the arena that have changed
my $ar = arena_ref_counts;
my %stats;
while(my ($k, $v) = each %$ar) {
my $diff = $v - ($arena_stats{$k} || 0);
$stats{$k} = $diff if $diff;
$arena_stats{$k} = $v;
}
printf "%12s %12s %s\n", "Abs Count", "Difference", "Data Type";
foreach(qw/ SCALAR REF ARRAY REF-ARRAY GLOB CODE /) {
printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_ if $stats{$_};
delete $stats{$_};
}
foreach(sort keys %stats) {
printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_;
}
}