[Nagiosplug-help] check_icmp problems

Lee Scott Lee.Scott at ihealthtechnologies.com
Mon Aug 25 23:11:53 CEST 2008


What is the load on the routers?

I have seen something similar with Extreme Networks equipment.

ICMP packets are not routed through the asic of the equipment and require
CPU intervention for handling.  Under high loads on my Extreme gear I would
notice that i would get high ping times and packet loss.  All other traffic
on the equipment worked fine, it was just a design flaw of the equipment
and not really an issue but icmp was being monitored correctly by nagios.

Just a thought.




                                                                           
             Israel Brewster                                               
             <israel at frontierf                                             
             lying.com>                                                 To 
             Sent by:                  Andreas Ericsson <ae at op5.se>        
             nagiosplug-help-b                                          cc 
             ounces at lists.sour         nagiosplug-help at lists.sourceforge.n 
             ceforge.net               et                                  
                                                                   Subject 
                                       Re: [Nagiosplug-help] check_icmp    
             08/25/2008 05:05          problems                            
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





-----------------------------------------------
Israel Brewster
Computer Support Technician
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------


On Aug 25, 2008, at 11:45 AM, Andreas Ericsson wrote:

> Israel Brewster wrote:
>> I think I may have mentioned this before, and if so I apologize,
>> but  it remains an issue so I thought I'd try again. I am having a
>> problem  using check_icmp where it consistently shows a number of
>> my hosts (all  of which are on the same hardware) as having 60%
>> packet loss, even  though a straight ping against these hosts
>> returns no packet loss,  even when doing a ping flood (thus
>> implying that the issue is not rate  limiting). More specifically,
>> it would appear that all but the first  two packets are being
>> dropped- if I increase the number of pings to  10, I get 80% loss,
>> if I decrease the number of pings to 4 I only get  50% loss, and if
>> I drop to two pings, I get no loss. Increasing the  delay between
>> packets has no noticeable effect until the delay numbers  get
>> ridiculously high, also indicating that rate limiting is not the
>> problem here.
>
> Are you pinging the nodes one by one, or all at once?

Both. To be more specific, I have been (trying) to use it as a general
ping, rather than check ping for most of my hosts simply because of
the speed factor- check_ping, as you are certainly aware, takes much
longer to check a host. For those, I am doing a single node. However,
I do have a couple of hosts with dual-WAN ports, and on those I am
pinging both at once, so nagios will only report critical on the host
as a whole if both ports are down. The devices with this issue show
the problem regardless of whether they are the only node being pinged
or not.

>
>
>> FWIW, I was having the same problem with the fping program from
>> smokeping, and it turned out that this was caused by the fping
>> binary  using the ICMP sequence number to indicate which host the
>> packet was  for, rather than incrementing the sequence number with
>> each packet  sent to a given host. After patching that, fping
>> worked fine. Perhaps  this is the same problem with check_icmp? I
>> seem to recall someone  giving a patch for that a while back, but I
>> could never get the patch  to apply properly, so I don't know if it
>> would have worked. Thanks for  any help/function patches that can
>> be provided!
>
> check_icmp is an fping derivative (although rewritten so much that I
> can't say which lines, if any, remains from the original binary).
>
> check_icmp does indeed maintain the host id number in the icmp->seq
> field. It's impossible to do otherwise when scanning multiple nodes
> if one wants to determine which of the hosts generated a particular
> error code, since error codes do not echo the data payload of the
> original packet.

So maybe the patch that fixed fping also broke something else? Haven't
noticed any problems yet, but maybe that just because of how it is
being used in smokeping.

>
>
> According to the ICMP RFC though (737, iirc), the sequence number
> of the header really shouldn't matter. It's for the sending host to
> determine and for the responding node to echo back.

Interesting. So apparently it is the remote device that is at fault,
although unfortunately there is nothing we can do about that.

>
>
> May I ask what kind of equipment you're working on? It could be that
> it's more worth to have accurate error responses on most hardware
> than it is to get accurate multi-node pings for some rather special
> hardware. Otoh, if you're running one check_icmp process per host,
> then the issue can be worked around while maintaining accuracy in
> error messages.

One per host, although as I mentioned a couple of the hosts have dual
interfaces, so check_icmp is pinging both. The devices are Linksys
RV082 routers running firmware newer than 1.3.2 (1.3.2 and older
firmware works fine, but isn't available), so nothing terribly
special. These are dual-WAN 8-port routers with VPN capabilities built
in, although we are not using the dual-wan functionality on all of
them. The commonality between ones that don't work is that they all
have a newer firmware than 1.3.2

>
>
> Btw, I wrote check_icmp once upon a time, and I'd like to keep it
> working as good as possible. The arse it one day bites might, after
> all, be my own ;-)
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagiosplug-help mailing list
Nagiosplug-help at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagiosplug-help
::: Please include plugins version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null






More information about the Help mailing list