Zabbix дээр top тушаалаар хамгийн их нөөц хэрэглэж байгаа сервисийг лог хийх
Finding CPU consumers
I wrote a little script to grab the output of the linux top-command into a Zabbix item. This was inspired by terataz http://www.zabbix.com/forum/member.php?u=23021, asking how to find out, e. g. when zabbix reports high CPU load, what process may be causing it.
My solution is not very sophisticated, but maybe somebody finds it useful, or feels like improving it.
By now, it reports the names of top cpu-time-consuming processes, if their CPU% exceeds a given value. With small modifications and an adjusted toprc configuration, one could use it for RAM consumers or anything else top is able to report.
#!/bin/bash ##################################################### # topcpu.sh # returns names of most CPU time consuming processes # as reported by 'top' ##################################################### # 05-07-2010 by Jerry Lenk # Use at your own risk! ##################################################### # set limit to 1st argument, or 2% if not specified lim=$1 test -z $lim && lim=2 # run 2 iterations of top in batch mode with 1 s delay top -b -d1 -n2 |\ gawk --assign lim=$lim 'BEGIN { reply=""} END { print reply, "." } # if reply is empty, at least a period is returned # in 2nd iteration, first 3 lines # add columns 9 (%cpu) and 12 (process name) # to reply string, if cpu at least lim% itr == 2 && NR <= 3 && $9 >= lim { reply=reply " " $9 "%" $12 } # count iterations by header lines beginning with "PID" # reset linenumber $1 == "PID" { NR=0 ; itr +=1 } ' # Only 2nd iteration of top is of interest because # load values are calculated since previous iteration
and add it as a UserParameter to zabbix-agentd.conf:
UserParameter=system.topcpu[*],<u>/etc/zabbix/userscript/</u>topcpu.sh $1
Description: CPU top consumerI have yet to try, if it makes sense to put the content of this item into an alert mail triggered by high CPU load. Probably the relevant process name shows up half a minute after the trigger turns "on".
Type: Zabbix agent
key: system.topcpu[5] (5 being the minimum %CPU load I want reported)
Type of information: text
Interval: 30 (I tried 60 first, but wasn't very satisfied.)
I'm glad you like it, but I will pass some of the thanks to the OP.
The last script of mine takes only 1 parameter, which is the percentage threshold. The first script takes 2 samples and will only give an answer when both samples have the same process as #1. The 2nd parameter is the delay between 2 samples.
It makes the script less sensitive for peaks, which may be something you want.
You're asking to have a top 3 instead of a top 1...
This means the 1st script of mine is not suited for this....
I adapted the 2nd script to show more than 1 process.
You need to give the amount of processes as a 2nd parameter in zabbix
Only the 1st process will show its open files.
The script is run twice a minute and you don't want the probe itself to be the process that consumes too much.
#!/bin/bash ##################################################### # topcpu # returns names of most CPU time consuming processes # as reported by 'top' ##################################################### # 05-07-2010 by Jerry Lenk # 02-11-2010 by Frater (rewrite in bash) # # Use at your own risk! ##################################################### # Add lsof to /etc/sudoers (as root) with the following command ########################## # echo zabbix ALL = NOPASSWD: `which lsof` >> /etc/sudoers # Comment out the tty requirement for sudo ########################## # sed -i -e 's/^Defaults.*requiretty/# &/' /etc/sudoers # Add to zabbix_agentd.conf ########################### # echo 'UserParameter=system.topcpu[*],/usr/local/sbin/topcpu $1 $2' >>/etc/zabbix/zabbix_agentd.conf # Restart Zabbix ################ # /etc/init.d/zabbix-agent restart # Constants nodata='.' deflimit=4 defanswers=1 use_lsof=1 GREP='grep --color=never -a' DEBUG=0 # set limit to 1st argument (given from zabbix), or deflimit if not specified lim=`echo "$1" | tr -cd '0-9.'` [ -z "${lim}" ] && lim=${deflimit} answers=`echo "$2" | tr -cd '0-9'` [ -z "${answers}" ] && answers=${defanswers} [ $answers -gt 5 ] && answers=5 [ $answers -lt 1 ] && answers=1 toptail="`top -b -d1 -n2 | ${GREP} -A${answers} '^ *PID ' | tail -n${answers}`" cpu=`echo "${toptail}" | head -n1 | awk '{print $9}'` [ ${DEBUG} -ne 0 ] && echo "Debug: \$1=$1 limit=$lim cpu=$cpu" if expr ${cpu} \<= ${lim} >/dev/null ; then echo "${nodata}" else # get PID & FULL process name (it may contain more info) pid=`echo "${toptail}" | head -n1 | awk '{print $1}'` procname="`ps --pid ${pid} -o args --no-headers 2>/dev/null`" if [ -z "${procname}" ] ; then # process is not running anymore... I might as well return nothing and quit echo "${nodata}" else user=`echo "${toptail}" | head -n1 | awk '{print $2}'` # return CPU usage, process owner and process name echo "${cpu}% ${user}:${procname}" if [ ${use_lsof} -ne 0 ] ; then # calculate the limit when it should execute lsof lim=$(( 2 * ${lim} + 5 )) [ ${lim} -gt 50 ] && lim=50 expr ${cpu} \> ${lim} >/dev/null && sudo lsof -p ${pid} -S -b -w -n -Fftn0 | ${GREP} -v '^fDEL' | ${GREP} 'tREG' | ${GREP} -o '/.*' | tr -d '\0' | ${GREP} -vE '(log$|^/var/lib|^/lib|^/var/run|^/tmp|^/usr/|^/var/log/)' | sort -u | head -n5 fi n=2 while [ $n -le ${answers} ] ; do topline="`echo "${toptail}" | tail -n+${n} | head -n1`" pid=`echo "${topline}" | awk '{print $1}'` user=`echo "${topline}" | awk '{print $2}'` cpu=`echo "${topline}" | awk '{print $9}'` procname="`ps --pid ${pid} -o args --no-headers 2>/dev/null`" echo "${cpu}% ${user}:${procname}" n=$(($n + 1)) done fi fi
</div>