[Nagiosplug-devel] Working on testcases

Ton Voon ton.voon at altinity.com
Mon Nov 7 01:53:38 CET 2005


Hi!

This is an interesting and important thread and I seem to have got  
some strong opinions, so we should continue with this until we get a  
result.

Just going to summarise where we are:

PROBLEM

While working on testcases, have noticed that "name resolution  
failure" now returns UNKNOWN instead of CRITICAL. What exactly should  
UNKNOWN mean?

VIEWS

John Rouillard suggested command line option for user to choose  
return code, but Ton Voon thinks this would overcomplicate. John  
retracted suggestion.

Garrett Honeycutt suggested configure time option for return code,  
but Andreas Ericsson thinks this is bad because compiled binaries  
should behave identically across platforms. I think the "configurable  
return code" suggestion can be dropped.

John suggests separating "host not found" and "cannot resolve"  
exceptions, so the former is a CRITICAL and the latter is an UNKNOWN,  
which is an interesting idea but I'm not sure what the philosophy of  
this is.

Andreas suggests a new status code in Nagios: "Transport/network  
error", and then UNKNOWN will mean "user error". With no network  
error state supported, Andreas suggests using UNKNOWN.

John's analysis is that there are two functions of a plugin:

   1) communication with device/service
   2) analysis of device/service and assigning appropriate status  
[and perf data]

MY TAKE

Trying to tie these views together, I think "transport/network"  
errors goes into (1). John's suggestion about "host not found" and  
"cannot resolve" go into (1) as well, but then this suggests there is  
no difference in state.

My feeling is that (2) depends on (1), so if (1) is not possible -  
for ANY reason - then I think that should be a CRITICAL (with  
appropriate message text). I think Nagios helps with the "transport/ 
network" error with things like "flapping" and "soft states" (I think  
Nagios works well because it doesn't try and come up with lots of  
different plugin states and just keeps it simple).

I think Garrett summed it up best for me: "I would rather get false  
positives than miss something because the status was UNKNOWN as  
opposed to CRITICAL"

NEXT STEPS

I think we need to bat this around a bit more to get consensus. If it  
gets to the stage where we need a vote, I'm happy to cast one out to  
the community.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon






More information about the Devel mailing list