Skip Menu |

This queue is for tickets about the POE-Component-SNMP CPAN distribution.

Report information
The Basics
Id: 95784
Status: open
Priority: 0/
Queue: POE-Component-SNMP

People
Owner: Nobody in particular
Requestors: matthias [...] towiski.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.1006
Fixed in: (no value)



Subject: Unresponsive SNMP agents cause PoCo::SNMP to leak memory
I'm debuggin an SNMP collector I wrote that queries several dozen devices for a rather large number of OIDs. On some installations (Fedora, Perl 5.18.2; CentOS 5, Perl 5.8.8 and CentOS 6, Perl 5.10.1) I noticed the process had grown to an obscene size after a day while on others it chugged along happily at roughly its initial size. The problem turned out to be a HA installation where of two firewalls only the currently active one responds to SNMP while the other one keeps quiet. The attached code shows the problem in isolation using Devel::Gladiator. It sends one query a second, and once every ten seconds it prints its current process size (Linux specific but easily disabled) and an overview of all objects whose number has changed since the last time the stats were shown. If you point $host to a device that responds to a sysUptime query, after a while there will be no changes in the number of objects allocated by Perl. If the device doesn't repond however, you gets something like this: Abs Count Difference Data Type 36064 317 SCALAR 1706 104 REF 3610 72 ARRAY 259 32 REF-ARRAY 1956 16 CODE 518 16 HASH 32 8 Net::SNMP::PDU 32 8 POE::Session::AnonEvent 136 16 REF-CODE 135 16 REF-HASH 34 8 REF-Net::SNMP::PDU 34 8 REF-Net::SNMP::Security::Community 35 8 REF-Net::SNMP::Transport::IPv4::UDP 33 8 REF-POE::Net::SNMP 32 8 REF-POE::Session::AnonEvent The module seems to be leaking PDUs that are either not correctly timed out or not properly cleaned up afterwards due to some circular references. It's definitely the POE part because an equivalent program sing plain Net::SNMP does not show this behavior even though Devel::Cycle shows circular structures like this: $Net::SNMP::A->{'_pdu'} => \%Net::SNMP::PDU::DH $Net::SNMP::PDU::DH->{'_callback'} => \&DI $DI variable $this => \$DJ $$DJ => \%Net::SNMP::A The same cycle is visible in the POE version: $POE::Kernel::A->[5] => \@POE::Queue::Array::IG $POE::Queue::Array::IG->[447] => \@AOD $AOD->[2] => \@AOE $AOE->[0] => \@POE::Session::HC $POE::Session::HC->[0] => \%HD $HD->{'snmp_session'} => \%POE::Net::SNMP::HE $POE::Net::SNMP::HE->{'_pdu'} => \%Net::SNMP::PDU::HF $Net::SNMP::PDU::HF->{'_callback'} => \&HG $HG variable $this => \$HH $$HH => \%POE::Net::SNMP::HE My guess is that the Net::SNMP dispatcher upon timeout actively breaks this cycle but I haven't had time to track it down.
Subject: poe-snmptest.pl
#!/usr/bin/perl use strict; use warnings; use POE qw/ Component::SNMP /; use Data::Dumper; use Devel::Gladiator qw/ arena_ref_counts /; my $host = 'localhost'; my $testoid = '.1.3.6.1.2.1.1.3.0'; my $count = 0; POE::Session->create( inline_states => { snmp_response => sub { my $r=$_[ARG1]->[0]; print ref $r ? $r->{$testoid} : $r, "\n"; }, snmp_query => \&_snmp_query, _start => \&_start, debug_meminfo => \&_debug_meminfo, }, ); POE::Kernel->run; sub _start { my ($kernel, $heap) = @_[KERNEL, HEAP]; POE::Component::SNMP->create( hostname => $host, retries => 0, debug => 0, ); $kernel->yield('snmp_query'); } sub _snmp_query { my ($kernel) = @_[KERNEL, ARG0]; $kernel->post('snmp', 'get', 'snmp_response', -varbindlist => [ $testoid ] ); $kernel->delay("snmp_query", 1); _debug_meminfo($kernel) unless $count = ($count+1) % 10; } my %arena_stats; sub _debug_meminfo { my ($kernel) = @_; # Determine our RSS open my $fh, '<', "/proc/$$/status" or do { $kernel->post('logger', 'info', "Can't open /proc/$$/status: $!"); return; }; my $rss; while(<$fh>) { ($rss) = /^VmRSS:\s*(\d+)/ and last } if(defined $rss) { print "Current RSS size: $rss KB; event queue: ".$kernel->[5]->get_item_count." elements\n"; } else { print "Can't find RSS size in /proc/$$/status; event queue: ".$kernel->[5]->get_item_count." elements\n"; } close $fh; # Print object counts in the arena that have changed my $ar = arena_ref_counts; my %stats; while(my ($k, $v) = each %$ar) { my $diff = $v - ($arena_stats{$k} || 0); $stats{$k} = $diff if $diff; $arena_stats{$k} = $v; } printf "%12s %12s %s\n", "Abs Count", "Difference", "Data Type"; foreach(qw/ SCALAR REF ARRAY REF-ARRAY GLOB CODE /) { printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_ if $stats{$_}; delete $stats{$_}; } foreach(sort keys %stats) { printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_; } }
Subject: snmptest.pl
#!/usr/bin/perl use strict; use warnings; use Net::SNMP; use Data::Dumper; use Devel::Gladiator qw/ arena_ref_counts /; use Devel::Cycle; use Time::HiRes 'sleep'; my $host = 'localhost'; my $testoid = '.1.3.6.1.2.1.1.3.0'; my $cycl0r=0; my ($sess, $err) = Net::SNMP->session( -hostname => $host, -nonblocking => 1, -debug => 1, ); die $err if $err; while(1) { next_request(); $sess->snmp_dispatcher; print "restart\n"; find_cycle($sess); debug_meminfo(); } sub next_request { sleep(0.5); unless($cycl0r = ($cycl0r + 1) % 10) { find_cycle($sess); debug_meminfo(); } $sess->get_request(-callback => \&result_cb, -varbindlist => [ $testoid ]) or print "req fail\n"; } sub result_cb { my $o = shift; print "result: ",$o->var_bind_list->{$testoid},"\n"; next_request(); } my %arena_stats; sub debug_meminfo { # Print object counts in the arena that have changed my $ar = arena_ref_counts; my %stats; while(my ($k, $v) = each %$ar) { my $diff = $v - ($arena_stats{$k} || 0); $stats{$k} = $diff if $diff; $arena_stats{$k} = $v; } printf "%12s %12s %s\n", "Abs Count", "Difference", "Data Type"; foreach(qw/ SCALAR REF ARRAY REF-ARRAY GLOB CODE /) { printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_ if $stats{$_}; delete $stats{$_}; } foreach(sort keys %stats) { printf "%12d %12d %s\n", $ar->{$_}, $stats{$_}, $_; } }
So, I've been unable to reproduce this, using your test scripts. After setting the $host parameter to an unresponsive IP and running both scripts for days on end, the POE script grows a *little* during the first, oh, 10 minutes or so, then is stable, and the non-POE version doesn't grow at all. In fact, I made a mistake at first and forgot to update the $host for the non-POE version (so it WAS getting replies) and in that case the script continued to consume additional memory (starting at about 52MB and I stopped it at 113MB, after 2 days). I've made no attempt to update the logic in the test scripts.
On Wed Jun 25 19:19:46 2014, RDB wrote: Show quoted text
> So, I've been unable to reproduce this, using your test scripts. After > setting the $host parameter to an unresponsive IP and running both > scripts for days on end, the POE script grows a *little* during the > first, oh, 10 minutes or so, then is stable, and the non-POE version > doesn't grow at all.
Strange, I'm getting the same behavior on half a dozen systems including various CentOS versions, Gentoo with Perl 5.16, Debian with 5.14 and Fedora with 5.18.2. To make sure it's not something weird in my setup I just whipped up a fresh Debian Wheezy, installed Perlbrew with 5.18.2 and as many modules as possible of the required ones via cpanminus to make sure they're in their latest versions. Same thing: snmptest.pl doesn't grow, poe-snmptest.pl does. I have packed the VM as a raw disk image with external kernel, initrd and XEN config file as http://bodega.towiski.de/poetest.tar.xz If you want to try it: login is root:changethis, and both test scripts are in root's home.