[Nagiosplug-help] Bug in check_icmp/MODE_HOSTCHECK? (1.4.11)

Holger Weiss holger at CIS.FU-Berlin.DE
Tue Jan 8 04:23:56 CET 2008


* Wolfgang Barth <wob at swobspace.de> [2007-12-28 14:05]:
> I'm using check_icmp from nagios-plugins-1.4.11. check_host is a symlink to
> check_icmp.
>
> time ./check_icmp -H 172.17.129.3
>  CRITICAL - 172.17.129.3: rta nan, lost 100%|\
>  rta=0.000ms;200.000;500.000;0; pl=100%;40;80;; 
>  0.000u 0.000s 0:03.62 0.0%      0+0k 0+0io 0pf+0w
>                ^^^^^^ OK
>
> time ./check_host -H 172.17.129.3
>  CRITICAL - 172.17.129.3: rta nan, lost 100%|\
>  rta=0.000ms;1000.000;1000.000;0; pl=100%;100;100;; 
>  0.000u 0.000s 0:10.00 0.0%      0+0k 0+0io 0pf+0w
>                ^^^^^^ BAD
>
> The host does not exist. round about 3 seconds after the first icmp packet
> a router answers with ICMP host unreachable. check_icmp aborts, check_host
> not.
>
> In earlier versions of Andreas Ericsson's check_icmp the goal of check_host
> was to abort immediately after such an ICMP host unreachable:

Actually, the detection of destination unreachable messages was broken,
no matter whether "check_icmp" or "check_host" was called.  This is
fixed in SVN now.  Thanks a lot for the report!

The reason for the difference you showed above is not that "check_icmp"
detected the destination unreachable message correctly, it simply gave
up much earlier than "check_host".  While "check_host" tries to exit as
fast as possible as soon as it received a response, it also waits much
longer for the first response by setting different default values for
"-c" and "i".  That is, if you call

	$ check_host -c 500 -i 80 -H 172.17.129.3

it'll be as fast as "check_icmp", whereas

	$ check_icmp -c 1000 -i 1000 -H 172.17.129.3

will be as slow as "check_host".

This difference will still show up with the fixed version in SVN in case
you don't get any response for a host check (because it's dropped by a
packet filter or whatever).  The check_icmp revision which intruced the
"check_host" feature in our CVS (1.5, 2005/02/01) behaved like this
already, though.  The idea seems to be something like "try really hard
to get a response, but exit immediately if we got one".  The comment on
"check_host" in the source says:

| MODE_HOSTCHECK: Return immediately upon any sign of life.  In
| addition, sends packets to ALL addresses assigned to this host (as
| returned by gethostbyname() or gethostbyaddr()) and expects one host
| only to be checked at a time.  Therefore, any packet response what so
| ever will count as a sign of life, even when received outside crit.rta
| limit.  Do not misspell any additional IP's.

Holger




More information about the Help mailing list