The other day I wrote a sort of wrapper that uses Nagios' check_icmp plugin to monitor several nodes on our network. If any one of the nodes down it returns a value of 1 (WARNING), if two or more nodes go down it returns a value of 2 (CRITICAL).<br>
<br>Unfortunately, even if it is critical it still shows up as Ok...even though I know it is returning a value of 2. The plugin is written in python. <br><br>#!/usr/bin/env python<br>import os<br>import subprocess<br>home="/usr/lib/nagios/plugins"<br>
<br>def checkNodes():<br>        up = 0<br>        aProblemsDetected=0<br>        bProblemsDetected=0<br>        cProblemsDetected=0<br>        status=0<br>    message=" - " <br>        TEST_OK=0<br>           TEST_WARNING=1<br>
        TEST_CRITICAL=2<br>        TEST_UKNOWN=3<br>        try:<br>                cloud_alpha_status=subprocess.check_output(home+"/check_icmp -H cloud-alpha ",shell=True)<br>                if "OK" in cloud_alpha_status:<br>
                        ++up<br><br>                else:<br>                        message=message+ "Cloud Bravo does not seem to be reachable. "<br>                        aProblemsDetected=1<br><br>        except:<br>
                message=message+"Node (Cloud-Alpha) is either down or unreachable. "<br>        aProblemsDetected=1<br>        try:<br>                cloud_bravo_status=subprocess.check_output(home+"/check_icmp -H cloud-bravo",shell=True)<br>
                if "OK" in cloud_bravo_status:<br>                        ++up<br>                else:<br>                        print cloud_bravo_status<br>                        message=message+"Cloud Bravo does not seem to be reachable. "<br>
                        bProblemsDetected=1<br>        except:<br>                message=message+"Node (Cloud-Bravo) is either down or unreachable. "<br>                bProblemsDetected=1<br>        try:<br>                cloud_charlie_status=subprocess.check_output(home+"/check_icmp -H cloud-charlie",shell=True)<br>
                if "OK" in cloud_charlie_status:<br>                        ++up<br>                else:<br>                        print cloud_charlie_status<br>                        message=message+"Cloud Charlie does not seem to be reachable. "<br>
                        cProblemsDetected=1<br>        except:<br>                print "Node (Cloud-Charlie) is either down or unreachable. "<br>                cProblemsDetected=1<br><br>        if aProblemsDetected+bProblemsDetected+cProblemsDetected==0:<br>
                status=TEST_OK<br>                message=message+"CLUSTER IS UP! -- Cloud-Alpha up -- Cloud-Bravo up -- Cloud-Charlie up"<br>        code='OK'<br>        if aProblemsDetected+bProblemsDetected+cProblemsDetected==1:<br>
                status=TEST_WARNING<br>        code='WARNING'<br>                message="Errors Detected:"+message<br>        elif aProblemsDetected+bProblemsDetected+cProblemsDetected>1:<br>                status=TEST_CRITICAL<br>
                message="Many Errors Detected"+message<br>        code='CRITICAL'<br>        return status,(code+' '+message)<br><br>print checkNodes()<br><br><br><br>