Monitor GlusterFS using nagios plugin

Here I am discussing about how to monitor Glusterfs using nagios plugin. You can download the plugin from the below link :

http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/GlusterFS-checks/details

I have copied the code at the bottom of this page, in-case you are not able to download from that link in future.

I assume you have already installed nagios packages (we are using nrpe for monitoring glusterfs). I have another post discussing how to configure glusterfs for files replication here :

https://gopukrish.wordpress.com/glusterfs/

Briefly, the concept is as follows :  

Download the script to the gluster node and make sure that it gives the exact output while executing. Once confirmed, add it as an argument to nrpe and call from the nagios server. If you are able to get the exact results, add it in the nagios configuration file of the gluster node you would like to monitor.

So here we start the processes : First, From gluster nodes, confirm that scripts executes fine and gives the exact results : /usr/lib/nagios/plugins/check_glusterfs.sh -v datavol -n 2 If you get any errors, do the following 2 steps in the gluster node (nagios client)

1. install package bc (eg : apt-get install bc)

2. set necessary permissions for the nrpe user : To find out the nrpe user, check in configuration file nrpe.cfg(gluster node). In my case, it was ‘nagios‘(change nagios with ‘nrpe’,if the user is ‘nrpe’ ). So give proper permission vi /etc/sudoers.d/nrpe

Defaults:nrpe !requiretty
nagios ALL=(root) NOPASSWD:/usr/sbin/gluster volume status [[\:graph\:]]* detail,/usr/sbin/gluster volume heal [[\:graph\:]]* info

If you haven’t added these permissions, you may get the below error : <pre>no bricks found </pre>

The same you can test from the gluster server as below :

root@www:/usr/local/nagios/etc/objects# /usr/local/nagios/libexec/check_nrpe -H my_server -c check_glusterfs

CRITICAL: no bricks found

Once the permission is added correctly:

root@www:/usr/local/nagios/etc/objects#

/usr/local/nagios/libexec/check_nrpe -H my_server -c check_glusterfs

OK: 2 bricks; free space 26GB

nrpe gives us the exact output while running from the nagios server. So we can safely add it in the  configuration file In nagios server :

my_server.cfg :

define service {
check_command check_nrpe!check_glusterfs
service_description Gluster Server Health Check
host_name my_server
use generic-service
}

change my_server with your server node or its ip address

If you hadn’t added the required permission for nrpe, you may get the below error :

gluster_critical

Note that in some version of nrpe, you might need to add

check_nrpe_1arg!check_glusterfs

instead of

check_nrpe!check_glusterfs

as the check_command

Edit the commands.cfg and define the below commands for nrpe unless you have a separate config file for nrpe. In my case it was in /etc/nagios-plugins/config/check_nrpe.cfg If you dont have any, you can add nrpe definition in commands.cfg as below :

commands.cfg :

define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ $ARG2$
}

In client : vi /etc/nagios/nrpe.cfg

command[check_glusterfs]=/usr/lib/nagios/plugins/check_glusterfs.sh -v datavol -n 2

This means, whenever the nagios server check the service using ‘check_nrpe!check_glusterfs’, the check_glusterfs.sh will run and give the output back to the nagios server. Once you have set everything correctly, you can see the status as ‘ok ‘ gluster_ok I know this is little bit confused and not organised. I just posted it for a quick reference for my colleagues. I shall organize it well later. For the mean time if you have any doubt, please let me know…. Thanks 🙂

gluster code :

#!/bin/bash

# This Nagios script was written against version 3.3 & 3.4 of Gluster. Older
# versions will most likely not work at all with this monitoring script.
#
# Gluster currently requires elevated permissions to do anything. In order to
# accommodate this, you need to allow your Nagios user some additional
# permissions via sudo. The line you want to add will look something like the
# following in /etc/sudoers (or something equivalent):
#
# Defaults:nagios !requiretty
# nagios ALL=(root) NOPASSWD:/usr/sbin/gluster volume status [[\:graph\:]]* detail,/usr/sbin/gluster volume heal [[\:graph\:]]* info
#
# That should give us all the access we need to check the status of any
# currently defined peers and volumes.

# Inspired by a script of Mark Nipper
#
# 2013, Mark Ruys, mark.ruys@peercode.nl

PATH=/sbin:/bin:/usr/sbin:/usr/bin

PROGNAME=$(basename -- $0)
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="1.0.0"

. $PROGPATH/utils.sh

# parse command line
usage () {
echo ""
echo "USAGE: "
echo " $PROGNAME -v VOLUME -n BRICKS [-w GB -c GB]"
echo " -n BRICKS: number of bricks"
echo " -w and -c values in GB"
exit $STATE_UNKNOWN
}

while getopts "v:n:w:c:" opt; do
case $opt in
v) VOLUME=${OPTARG} ;;
n) BRICKS=${OPTARG} ;;
w) WARN=${OPTARG} ;;
c) CRIT=${OPTARG} ;;
*) usage ;;
esac
done

if [ -z "${VOLUME}" -o -z "${BRICKS}" ]; then
usage
fi

Exit () {
$ECHO "$1: ${2:0}"
status=STATE_$1
exit ${!status}
}

# check for commands
for cmd in basename bc awk sudo pidof gluster; do
if ! type -p "$cmd" >/dev/null; then
Exit UNKNOWN "$cmd not found"
fi
done

# check for glusterd (management daemon)
if ! pidof glusterd &>/dev/null; then
Exit CRITICAL "glusterd management daemon not running"
fi

# check for glusterfsd (brick daemon)
if ! pidof glusterfsd &>/dev/null; then
Exit CRITICAL "glusterfsd brick daemon not running"
fi

# get volume heal status
heal=0
for entries in $(sudo gluster volume heal ${VOLUME} info | awk '/^Number of entries: /{print $4}'); do
if [ "$entries" -gt 0 ]; then
let $((heal+=entries))
fi
done
if [ "$heal" -gt 0 ]; then
errors=("${errors[@]}" "$heal unsynched entries")
fi

# get volume status
bricksfound=0
freegb=9999999
shopt -s nullglob
while read -r line; do
field=($(echo $line))
case ${field[0]} in
Brick)
brick=${field[@]:2}
;;
Disk)
key=${field[@]:0:3}
if [ "${key}" = "Disk Space Free" ]; then
freeunit=${field[@]:4}
free=${freeunit:0:-2}
unit=${freeunit#$free}
if [ "$unit" != "GB" ]; then
Exit UNKNOWN "unknown disk space size $freeunit"
fi
free=$(echo "${free} / 1" | bc -q)
if [ $free -lt $freegb ]; then
freegb=$free
fi
fi
;;
Online)
online=${field[@]:2}
if [ "${online}" = "Y" ]; then
let $((bricksfound++))
else
errors=("${errors[@]}" "$brick offline")
fi
;;
esac
done < <(sudo gluster volume status ${VOLUME} detail)

if [ $bricksfound -eq 0 ]; then
Exit CRITICAL "no bricks found"
elif [ $bricksfound -lt $BRICKS ]; then
errors=("${errors[@]}" "found $bricksfound bricks, expected $BRICKS ")
fi

if [ -n "$CRIT" -a -n "$WARN" ]; then
if [ $CRIT -ge $WARN ]; then
Exit UNKNOWN "critical threshold below warning"
elif [ $freegb -lt $CRIT ]; then
Exit CRITICAL "free space ${freegb}GB"
elif [ $freegb -lt $WARN ]; then
errors=("${errors[@]}" "free space ${freegb}GB")
fi
fi

# exit with warning if errors
if [ -n "$errors" ]; then
sep='; '
msg=$(printf "${sep}%s" "${errors[@]}")
msg=${msg:${#sep}}

Exit WARNING "${msg}"
fi

# exit with no errors
Exit OK "${bricksfound} bricks; free space ${freegb}GB"
Advertisements

2 thoughts on “Monitor GlusterFS using nagios plugin

  1. thank you very much.. spend a lot of time to figure out the “CRITICAL: no bricks found” error.. then i found your site..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s