Subject: | Problem with pctcpu on Linux |
Date: | Sat, 22 Dec 2012 12:29:04 -0500 |
To: | bug-Proc-ProcessTable [...] rt.cpan.org |
From: | "Matthew L. Dailey" <matthew.l.dailey [...] dartmouth.edu> |
Greetings,
We're using Proc::ProcessTable on some beefy linux boxes and found an issue with pctcpu when long-running processes use lots of cpu. On this particular system, there are two six-core cpus with hyperthreading, so there are 24 logical cpus. Under linux, this means there is a theoretical max cpu "percentage" of 2400%.
If a process uses lots of cpu for a long time, this causes a buffer overflow in pctcpu once it goes over 999.99%, since it's buffer is only 6 bytes.
Here's /proc/<pid>/stat for the process that produces the overflow:
# cat /proc/23427/stat
23427 (sdevice) S 16424 23427 16424 34816 23427 4202496 3854777420 3716 11765 0 179490227 1688781 0 0 20 0 44 0 155125884 173169319936 30671991 18446744073709551615 4194304 190125333 140736691917600 140736691909504 47611949540385 0 8192 0 640 18446744073709551615 0 0 17 15 0 0 1540 0 0
And, here's the backtrace if I compile with debugging symbols and run in gdb:
#0 0x00007ffff76d5425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff76d8b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff771339e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff77a9807 in __fortify_fail ()
from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff77a8700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff77a7b69 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007ffff76eefcb in __printf_fp () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007ffff76ea5b8 in vfprintf () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00007ffff77a7c04 in __vsprintf_chk ()
from /lib/x86_64-linux-gnu/libc.so.6
#9 0x00007ffff77a7b4d in __sprintf_chk () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007ffff6473297 in sprintf (__s=0x7dc4f8 "1051.1",
__fmt=0x7ffff6474f9d "%3.2f")
at /usr/include/x86_64-linux-gnu/bits/stdio2.h:34
#11 calc_prec (prs=0x7dc410,
format_str=0x7dc510 "iiisiiiillllljjjjijllljjsiiiiiiSSsSS",
mem_pool=<optimized out>) at OS.c:542
#12 OS_get_table () at OS.c:651
#13 0x00007ffff6474ab8 in XS_Proc__ProcessTable_table (
my_perl=<optimized out>, cv=<optimized out>) at ProcessTable.xs:353
#14 0x00007ffff7b1384f in Perl_pp_entersub () from /usr/lib/libperl.so.5.14
#15 0x00007ffff7b0ace6 in Perl_runops_standard () from /usr/lib/libperl.so.5.14
#16 0x00007ffff7aac36a in perl_run () from /usr/lib/libperl.so.5.14
#17 0x0000000000400db9 in main ()
Here is a quick hack to work around this problem:
--- Linux.h.orig 2008-09-08 11:08:41.000000000 -0400
+++ Linux.h 2012-12-22 12:04:29.581138706 -0500
@@ -42,7 +42,7 @@
char *cmndline;
char *exec;
/* other values */
- char pctcpu[sizeof("100.00")]; /* precent cpu, without '%' char */
+ char pctcpu[sizeof("1000.00")]; /* precent cpu, without '%' char */
char pctmem[sizeof("100.00")]; /* precent memory, without '%' char */
};
Looking at this, there are also spelling errors in the comments (precent vs. percent), but that's not causing any trouble. :-)
--- Linux.c.orig 2008-09-08 11:10:41.000000000 -0400
+++ Linux.c 2012-12-22 12:04:52.121362495 -0500
@@ -539,7 +539,7 @@
float pctcpu = 100.0f * (prs->utime / 1e6) / (time(NULL) - prs->start_time);
/* calculate pctcpu - NOTE: This assumes the cpu time is in microsecond units! */
- sprintf(prs->pctcpu, "%3.2f", pctcpu);
+ sprintf(prs->pctcpu, "%4.2f", pctcpu);
field_enable(format_str, F_PCTCPU);
/* calculate pctmem */
This is, of course, just a temporary fix, but would make it work with systems with up to 99 cpus. Since this isn't too far off, it might be worth bumping a few more bytes. :-)
I hope this helps. Please let me know if you need any more info.
Best,
Matthew L. Dailey
Systems Administrator
Thayer School of Engineering
Dartmouth College