[Nagiosplug-devel] Check_ping question

Andreas Ericsson ae at op5.se
Thu Nov 11 02:22:00 CET 2004


Sean Dilda wrote:
> On Wed, 2004-11-10 at 19:41, Andreas Ericsson wrote:
> 
> 
>>>I guess I also am confused on this one, as /bin/ping on most OS's will
>>>send packet 2 at 1000ms, and packet 3 at 2000ms, with a 5000 ms timeout,
>>>that's 7.0000000001 seconds total. Why are we going all the way up to
>>>18, or even 15 seconds then?
>>>
>>
>>It's just the maximum. check_ping doesn't have a backoff factor (which 
>>is what you describe), since it forks system ping to do its dirtywork.
> 
> 
> How does check_ping not have that?

Last time I checked, check_ping used the -i flag, which overrides gnu 
ping's backoff factor.

>  I ran some tests.  If you tell
> check_ping to send 3 packets, it calls ping once and tells ping to send
> 3 packets.  Therefore, check_ping sends the packets out at one-second
> intervals because ping sends them out at one-second intervals.  So, if
> you have a 5 second critical timeout, and are sending 3 packets, the
> maximum theoretical timeout is just over 7 seconds (as Robert pointed
> out).  Waiting for 15+ seconds is just wasted time.
> 

My guess is that the -i flag argument is taken from the critical rta 
threshold. Also, ping doesn't thread (why should it), so that might 
present a problem with. It's just a guess, though, so I wouldn't know. 
What I do know is that once the ping binary has finished up, check_ping 
just parses the output and prints it, so the check_ping timeout value is 
more of a safeguard against ping crashing/hanging/whatever.

> Having seen what happens to a nagios system that undergoes a massive
> network outage (with few network switches defined), I think its very
> important that the check-host-alive plugins not waste more time than
> they can.

If by "can" you mean "have to" I'm with you there. Hence the work I'm 
putting down on the check_host_alive plugin. It will make Nagios scale 
nicely (at least nicer) even on very broken networks.

>  (This is due in large part to the fact that nagios serializes them.)
> 

Yes. That part needs to be rewritten, but it's a sort of major design 
issue so me and other contributors are unwilling to take it on since 
Ethan might reject it. It's also pretty difficult, so it's not exactly 
the kind of thing you do over lunch, which means rejection would hurt a 
little bit. ;)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer




More information about the Devel mailing list