[Nagiosplug-help] check_icmp problems

Israel Brewster israel at frontierflying.com
Mon Aug 25 23:04:51 CEST 2008


-----------------------------------------------
Israel Brewster
Computer Support Technician
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------


On Aug 25, 2008, at 11:45 AM, Andreas Ericsson wrote:

> Israel Brewster wrote:
>> I think I may have mentioned this before, and if so I apologize,  
>> but  it remains an issue so I thought I'd try again. I am having a  
>> problem  using check_icmp where it consistently shows a number of  
>> my hosts (all  of which are on the same hardware) as having 60%  
>> packet loss, even  though a straight ping against these hosts  
>> returns no packet loss,  even when doing a ping flood (thus  
>> implying that the issue is not rate  limiting). More specifically,  
>> it would appear that all but the first  two packets are being  
>> dropped- if I increase the number of pings to  10, I get 80% loss,  
>> if I decrease the number of pings to 4 I only get  50% loss, and if  
>> I drop to two pings, I get no loss. Increasing the  delay between  
>> packets has no noticeable effect until the delay numbers  get  
>> ridiculously high, also indicating that rate limiting is not the   
>> problem here.
>
> Are you pinging the nodes one by one, or all at once?

Both. To be more specific, I have been (trying) to use it as a general  
ping, rather than check ping for most of my hosts simply because of  
the speed factor- check_ping, as you are certainly aware, takes much  
longer to check a host. For those, I am doing a single node. However,  
I do have a couple of hosts with dual-WAN ports, and on those I am  
pinging both at once, so nagios will only report critical on the host  
as a whole if both ports are down. The devices with this issue show  
the problem regardless of whether they are the only node being pinged  
or not.

>
>
>> FWIW, I was having the same problem with the fping program from   
>> smokeping, and it turned out that this was caused by the fping  
>> binary  using the ICMP sequence number to indicate which host the  
>> packet was  for, rather than incrementing the sequence number with  
>> each packet  sent to a given host. After patching that, fping  
>> worked fine. Perhaps  this is the same problem with check_icmp? I  
>> seem to recall someone  giving a patch for that a while back, but I  
>> could never get the patch  to apply properly, so I don't know if it  
>> would have worked. Thanks for  any help/function patches that can  
>> be provided!
>
> check_icmp is an fping derivative (although rewritten so much that I
> can't say which lines, if any, remains from the original binary).
>
> check_icmp does indeed maintain the host id number in the icmp->seq
> field. It's impossible to do otherwise when scanning multiple nodes
> if one wants to determine which of the hosts generated a particular
> error code, since error codes do not echo the data payload of the
> original packet.

So maybe the patch that fixed fping also broke something else? Haven't  
noticed any problems yet, but maybe that just because of how it is  
being used in smokeping.

>
>
> According to the ICMP RFC though (737, iirc), the sequence number
> of the header really shouldn't matter. It's for the sending host to
> determine and for the responding node to echo back.

Interesting. So apparently it is the remote device that is at fault,  
although unfortunately there is nothing we can do about that.

>
>
> May I ask what kind of equipment you're working on? It could be that
> it's more worth to have accurate error responses on most hardware
> than it is to get accurate multi-node pings for some rather special
> hardware. Otoh, if you're running one check_icmp process per host,
> then the issue can be worked around while maintaining accuracy in
> error messages.

One per host, although as I mentioned a couple of the hosts have dual  
interfaces, so check_icmp is pinging both. The devices are Linksys  
RV082 routers running firmware newer than 1.3.2 (1.3.2 and older  
firmware works fine, but isn't available), so nothing terribly  
special. These are dual-WAN 8-port routers with VPN capabilities built  
in, although we are not using the dual-wan functionality on all of  
them. The commonality between ones that don't work is that they all  
have a newer firmware than 1.3.2

>
>
> Btw, I wrote check_icmp once upon a time, and I'd like to keep it
> working as good as possible. The arse it one day bites might, after
> all, be my own ;-)
>
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231





More information about the Help mailing list