Zabbix дээр top тушаалаар хамгийн их нөөц хэрэглэж байгаа сервисийг лог хийх

Дусал нэвтэрхий толь-с

Finding CPU consumers



I wrote a little script to grab the output of the linux top-command into a Zabbix item. This was inspired by terataz http://www.zabbix.com/forum/member.php?u=23021, asking how to find out, e. g. when zabbix reports high CPU load, what process may be causing it.

My solution is not very sophisticated, but maybe somebody finds it useful, or feels like improving it.
By now, it reports the names of top cpu-time-consuming processes, if their CPU% exceeds a given value. With small modifications and an adjusted toprc configuration, one could use it for RAM consumers or anything else top is able to report.

Code:
#!/bin/bash
#####################################################
# topcpu.sh
# returns names of most CPU time consuming processes
# as reported by 'top'
#####################################################
# 05-07-2010 by Jerry Lenk
# Use at your own risk!
#####################################################

# set limit to 1st argument, or 2% if not specified
lim=$1
test -z $lim && lim=2

# run 2 iterations of top in batch mode with 1 s delay
top -b -d1 -n2 |\
gawk --assign lim=$lim  'BEGIN { reply=""}
        END { print reply, "." }
        # if reply is empty, at least a period is returned

        # in 2nd iteration, first 3 lines
        # add columns 9 (%cpu) and 12 (process name)
        # to reply string, if cpu at least lim%
        itr == 2 && NR <= 3 && $9 >= lim { reply=reply " " $9 "%" $12 }

        # count iterations by header lines beginning with "PID"
        # reset linenumber
        $1 == "PID" { NR=0 ; itr +=1 }
       '
# Only 2nd iteration of top is of interest because
# load values are calculated since previous iteration
I save it as "topcpu.sh" to my scripts directory on the monitored machine (/etc/zabbix/userscript in my case)

and add it as a UserParameter to zabbix-agentd.conf:

Code:
UserParameter=system.topcpu[*],<u>/etc/zabbix/userscript/</u>topcpu.sh $1
If you prefer to place the script somewhere else, change the underlined path accordingly.


Now all I need to do is restart zabbix_agentd and create an item for it:
Description: CPU top consumer

Type: Zabbix agent
key: system.topcpu[5] (5 being the minimum %CPU load I want reported)
Type of information: text
Interval: 30 (I tried 60 first, but wasn't very satisfied.)

I have yet to try, if it makes sense to put the content of this item into an alert mail triggered by high CPU load. Probably the relevant process name shows up half a minute after the trigger turns "on".

I'm glad you like it, but I will pass some of the thanks to the OP.

The last script of mine takes only 1 parameter, which is the percentage threshold. The first script takes 2 samples and will only give an answer when both samples have the same process as #1. The 2nd parameter is the delay between 2 samples.
It makes the script less sensitive for peaks, which may be something you want.

You're asking to have a top 3 instead of a top 1...
This means the 1st script of mine is not suited for this....

I adapted the 2nd script to show more than 1 process.
You need to give the amount of processes as a 2nd parameter in zabbix

Only the 1st process will show its open files.
The script is run twice a minute and you don't want the probe itself to be the process that consumes too much.

Code:
#!/bin/bash
#####################################################
# topcpu
# returns names of most CPU time consuming processes
# as reported by 'top'
#####################################################
# 05-07-2010 by Jerry Lenk
# 02-11-2010 by Frater (rewrite in bash)
#
# Use at your own risk!
#####################################################

# Add lsof to /etc/sudoers (as root) with the following command
##########################
#     echo zabbix ALL = NOPASSWD: `which lsof` >> /etc/sudoers

# Comment out the tty requirement for sudo
##########################
#     sed -i -e 's/^Defaults.*requiretty/# &/' /etc/sudoers

# Add to zabbix_agentd.conf
###########################
#     echo 'UserParameter=system.topcpu[*],/usr/local/sbin/topcpu $1 $2' >>/etc/zabbix/zabbix_agentd.conf

# Restart Zabbix
################
#     /etc/init.d/zabbix-agent restart

# Constants
nodata='.'
deflimit=4
defanswers=1
use_lsof=1
GREP='grep --color=never -a'
DEBUG=0

# set limit to 1st argument (given from zabbix), or deflimit if not specified
lim=`echo "$1" | tr -cd '0-9.'`
[ -z "${lim}" ] && lim=${deflimit}

answers=`echo "$2" | tr -cd '0-9'`
[ -z "${answers}" ] && answers=${defanswers}
[ $answers -gt 5  ] && answers=5
[ $answers -lt 1  ] && answers=1

toptail="`top -b -d1 -n2 | ${GREP} -A${answers} '^ *PID ' | tail -n${answers}`"
cpu=`echo "${toptail}"  | head -n1 | awk '{print $9}'`

[ ${DEBUG} -ne 0 ] && echo "Debug: \$1=$1  limit=$lim  cpu=$cpu"

if expr ${cpu} \<= ${lim} >/dev/null ; then
  echo "${nodata}"
else
  # get PID & FULL process name (it may contain more info)
  pid=`echo "${toptail}" | head -n1 | awk '{print $1}'`
  procname="`ps --pid ${pid} -o args --no-headers 2>/dev/null`"

  if [ -z "${procname}" ] ; then
    # process is not running anymore... I might as well return nothing and quit
    echo "${nodata}"
  else

    user=`echo "${toptail}" | head -n1 | awk '{print $2}'`
    # return CPU usage, process owner and process name
    echo "${cpu}%   ${user}:${procname}"

    if [ ${use_lsof} -ne 0 ] ; then
      # calculate the limit when it should execute lsof
      lim=$(( 2 * ${lim} + 5 ))
      [ ${lim} -gt 50 ] && lim=50
      expr ${cpu} \> ${lim} >/dev/null && sudo lsof -p ${pid} -S -b -w -n -Fftn0 | ${GREP} -v '^fDEL' | ${GREP} 'tREG'  | ${GREP} -o '/.*' | tr -d '\0' | ${GREP} -vE '(log$|^/var/lib|^/lib|^/var/run|^/tmp|^/usr/|^/var/log/)' | sort -u | head -n5
    fi

    n=2
    while [ $n -le ${answers} ] ; do
      topline="`echo "${toptail}" | tail -n+${n} | head -n1`"
       pid=`echo "${topline}" | awk '{print $1}'`
      user=`echo "${topline}" | awk '{print $2}'`
       cpu=`echo "${topline}" | awk '{print $9}'`
      procname="`ps --pid ${pid} -o args --no-headers 2>/dev/null`"
      echo "${cpu}%   ${user}:${procname}"

      n=$(($n + 1))
    done

  fi
fi

</div>