[Nagiosplug-devel] r1879: check_tcp now returns UNKNOWN with an invalid hostname on command line

Holger Weiss holger at CIS.FU-Berlin.DE
Wed Jan 9 19:05:02 CET 2008


* Thomas Guyot-Sionnest <dermoth at aei.ca> [2008-01-08 07:27]:
> On 08/01/08 06:48 AM, Ton Voon wrote:
> > Fair point. I specifically made this change because in a Nagios  
> > configuration $HOSTADDRESS$ wasn't set, so the check_tcp was  
> > effectively running:
> > 
> > check_tcp -H -w 5
> > 
> > and mistaking -w for the hostname. My thinking was that this was a  
> > command line options error, hence UNKNOWN. However, I've obviously  
> > implemented it as a hostname resolution check.
> > 
> > Looking at http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN78 
> > , it's a higher level error so shouldn't be UNKNOWN and should be  
> > CRITICAL.
> > 
> > So, sorry, I screwed up here twice: once for changing to UNKNOWN, and  
> > twice for not checking the test results to see the impact.
> > 
> > I'll revert back now.
> 
> The arguments goes in both ways. It is fair to think name resolution
> belongs to check_dns

The problem is that there are not only the two cases that DNS (or some
other "dependant-upon-functionality") either works or doesn't, but
there's the third case that it works in general but not for some
particular check.  E.g., the record for "example.com" might be missing
or broken even though your DNS server is fine.  Therefore, I definitely
want a notification if "check_ping example.com" fails due to a failure
in resolving "example.com", _unless_ the actual DNS check also fails.
IMHO, the only clean solution to this really is to properly reflect such
dependencies in the Nagios configuration.

> and in my setups I use IPs everywhere I can and the dns servers have
> their own checks.

For the reason I stated above (and for maintainability), I use host
names everywhere I can :-)

> I didn't push my vote towards the reversal of your commit because some
> other plugins I checked goes UNKNOWN on invalid hosts too.

I'd personally vote against UNKNOWN in these cases, as the only use I
see in making a distinction between CRITICAL and UNKNOWN is to be able
to handle an UNKNOWN differently from a CRITICAL (especially regarding
notifications, of course); and if a check fails for reasons which are
out of the plugin's scope, I personally want notifications suppressed
only by Nagios' dependency logic and not via the plugin status.  The
plugin just cannot know whether only the "example.com" record or DNS in
general is broken, so IMHO it shouldn't try to be too clever, here.

If we'd consistently return an UNKOWN _only_ on _internal_ plugin errors
as the guidelines state, then I'd configure Nagios to only send me
(i.e., the Nagios admin) notifications about UNKNOWNs so that others
wouldn't be bothered with these.  But in practice, an UNKNOWN is
sometimes returned on problems others want to be notified about, so I
make no distinction between CRITICAL and UNKNOWN in my configuration.
Then again, internal plugin errors are rare, so this isn't a real
problem for me.  However, for this reason I don't really care that much
about the CRITICAL<->UNKNOWN distinction in practice :-P

> The developer guideline should be clear on this (if it's not already)
> and plugins not following the specs should be fixed.

Yes.

> I don't believe this should apply for plugin that use a protocol to
> check something behind it (i.e. check_snmp, check_nrpe, check_by_ssh,
> check_nt).

For this special case, I have no strong opinion.  I agree it's different
from cases such as DNS.  The chance that only the Nagios check (and not
an actual service provided to customers) is affected by problems and
therefore that only the Nagios admin wants to be notified is probably
higher than in case of "example.com" not resolving.

Holger




More information about the Devel mailing list