[Nagiosplug-help] check_icmp seems flapping - followon to RE: make of nagios-plugins-1.4.5 on AIX 4.3 fails

Andreas Ericsson ae at op5.se
Thu Nov 30 14:00:33 CET 2006


Ralph.Grothe at itdz-berlin.de wrote:
> I am still seem to have some serious trouble with my build of the
> check_icmp plugin.
> 
> Because the make was prematurely aborted (owe to the check_swap
> error)
> I manually chown-ed root of check_icmp and chmod-ed u+s 
> because ICMP packet generation I assume requires root privileges.
> 
> I then copied it into $USER1$ and there set a hard link to
> check_host.
> 
> In my hosts.cfg template I defined a check-host-alive as default
> check_command
> that looks like this
> 
> define command {
>     command_name	check-host-alive
>     command_line	$USER1$/check_host -H $HOSTADDRESS$ -t 15
> -c 10000
> }
> 
> 
> After a bit of further tweaking of my config files to reflect a
> hopefully cleaner overall layout
> I uncautiously started the new 2.5 nagios after all pre-flight
> checks were satisfied
> without prior disabling of host notifications.
> 
> I then was shocked to realize that nagios was cheerfully churning
> out dozens of alert notification
> when the hosts' states changed from soft critical to hard
> critical.
> Only to minutes later relaps from hard critical to hard ok, and
> notifying about the recovery
> (because host notification_options of course included r in my
> template).
> This was kind of flip flopping for many hosts.
> 
> I then ran check_host several times manually where I realized the
> following hanging:
> 
> 
> $ ~/libexec/check_host -H somehost
> mode: 1
> CRITICAL - somehost: rta nan, lost
> 100%|rta=0.000ms;1000.000;1000.000;0; pl=100%;
> 100;100;; 
> 
> 
> But an instantly followed ping always returned the echo requests:
>  
> $ ping -c 3 somehost
> PING somehost.somewhere.tld: (123.123.123.123): 56 data bytes
> 64 bytes from 123.123.123.123: icmp_seq=0 ttl=248 time=3 ms
> 64 bytes from 123.123.123.123: icmp_seq=1 ttl=248 time=3 ms
> 64 bytes from 123.123.123.123: icmp_seq=2 ttl=248 time=3 ms
> 
> ----somehost.somewhere.tld PING Statistics----
> 3 packets transmitted, 3 packets received, 0% packet loss
> round-trip min/avg/max = 3/3/3 ms
> 
> 
> Now I am curiuous whether my compilation of check_icmp is ok?
> 

You'd get this problem if you use an old check_icmp on a system that 
handles process id's > 65535. In the old version, check_icmp didn't 
recognize valid ICMP responses because the id-field used in the icmp 
header is only 16 bits wide, so a 32-bit pid doesn't fit in it. This 
would typically only happen when the pid of check_icmp is larger than 
65535, which would explain the checks hopping between OK for a while and 
non-OK for a while. Judging by "mode: 1" above, I'd say your check_icmp 
is fairly old and needs to be upgraded. What version of the plugins are 
you using?

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231




More information about the Help mailing list