AW: help needed with nagios alert

Heinze, Markus Markus.Heinze at esta-bw.de
Tue Jan 27 12:03:11 CET 2015


Hi,

only a try to sort some things out.

Didn't know much of hadoop cluster, but think cluster means different clusternodes.
Did you check the master node against the free disk space or each node independently ?
An entry in the hosts.cfg for the world accessible hadoop cluster ip/dns name and different entrys for each clusternode?


We use a small linux webcluster with replicated MySQL databases and webdirectoys.
For replication we use DRBD and pacemaker as resource manager.
We get alerts for the whole cluster and each cluster node.


So, I use two different check_disk alerts. One for the replicated volume: check_linux_drbd0_disk.
Volume size and free disk space is the same over each cluster node.

The second check_disk alert checks the real hdd in each clusternode: check_linux_root_disk.
It's the physical hdd plugged into each cluster node.


$HOSTADDRESS$:
For check_linux_drbd0_disk it is the active, world accessible address. For example: www.example.com
For check_linux_root_disk it is the internal address of each clusternode. For example clusternode1.internal.com, clusternode2.internal.com


The objects/commands.cfg:
define command{
        command_name    check_linux_drbd0_disk
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_drbd0
        }


define command{
        command_name    check_linux_root_disk
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_sda1
        }


The /usr/local/nagios/etc/nrpe.cfg on each clusternode:
command[check_drbd0]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/drbd0
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda1


With this, we get alerts:
Running out of disk space for www.example.com
Running out of disk space for each clusternode


Regards,
Markus.



Earn money: http://www.verdiene-geld-im-netz.de/en/index.html



Von: Help [mailto:help-bounces+markus.heinze=esta-bw.de at monitoring-plugins.org] Im Auftrag von Natva, Arun Kumar
Gesendet: Freitag, 23. Januar 2015 23:47
An: help at monitoring-plugins.org
Betreff: help needed with nagios alert

Hi,
I am using nagios for alerting in our hadoop cluster.

When I setup a check_disk alert on all the nodes in the cluster, we are getting emails for all the hosts even though only one of the nodes exceeds the disk space threshold.

I tried multiple things but I am unable to figure out why nagios sends alerts for all hosts instead of just one host. Can you please help

Regards,
Arun.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-plugins.org/archive/help/attachments/20150127/756e95b0/attachment.html>


More information about the Help mailing list