[Nagiosplug-help] check_icmp problems

Chris Adams cmadams at hiwaay.net
Tue Aug 26 02:32:48 CEST 2008


Once upon a time, Andreas Ericsson <ae at op5.se> said:
> check_icmp does indeed maintain the host id number in the icmp->seq
> field. It's impossible to do otherwise when scanning multiple nodes
> if one wants to determine which of the hosts generated a particular
> error code, since error codes do not echo the data payload of the
> original packet.
> 
> According to the ICMP RFC though (737, iirc), the sequence number
> of the header really shouldn't matter. It's for the sending host to
> determine and for the responding node to echo back.
> 
> May I ask what kind of equipment you're working on? It could be that
> it's more worth to have accurate error responses on most hardware
> than it is to get accurate multi-node pings for some rather special
> hardware. Otoh, if you're running one check_icmp process per host,
> then the issue can be worked around while maintaining accuracy in
> error messages.
> 
> Btw, I wrote check_icmp once upon a time, and I'd like to keep it
> working as good as possible. The arse it one day bites might, after
> all, be my own ;-)

I saw the same problem with IIRC some Linksys firewalls and some other
firewall-type gear.  Basically, they were rate-limiting ICMP echo
requests with the same sequence within a certain time frame.

My solution was to use a few bits of the sequence number as, well, as
sequence number. :-)  This cuts down on the number of hosts you can
simultaneously monitor; I used 4 bits for a counter (a max of 16 unique
sequence numbers per host, which should probably be enough) which leaves
12 bits for hosts (so you could still ping 2048 hosts).  The number of
bits used for the sequence portion of the field is a #define, so it is
easily changed (although there is no range checking, so raising it too
high will give undefined results).

This patch has been running for me for a while now with no further
problems.  I posted it here before; I've included it again below.
-- 
Chris Adams <cmadams at hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.


diff -urN nagios-plugins-1.4.8-dist/plugins-root/check_icmp.c nagios-plugins-1.4.8/plugins-root/check_icmp.c
--- nagios-plugins-1.4.8-dist/plugins-root/check_icmp.c	2007-03-27 01:53:57.000000000 -0500
+++ nagios-plugins-1.4.8/plugins-root/check_icmp.c	2008-02-13 14:16:17.000000000 -0600
@@ -117,6 +117,7 @@
 	unsigned int icmp_sent, icmp_recv, icmp_lost; /* counters */
 	unsigned char icmp_type, icmp_code; /* type and code from errors */
 	unsigned short flags;        /* control/status flags */
+	unsigned short icmp_cnt;     /* ICMP sequence number */
 	double rta;                  /* measured RTA */
 	unsigned char pl;            /* measured packet loss */
 	struct rta_host *next;       /* linked list */
@@ -173,6 +174,9 @@
 #define TSTATE_ALIVE 0x04       /* target is alive (has answered something) */
 #define TSTATE_UNREACH 0x08
 
+/* How many bits of the sequence to use for a counter */
+#define SEQ_BITS 4
+
 /** prototypes **/
 void print_help (void);
 void print_usage (void);
@@ -326,14 +330,14 @@
 	 * to RFC 792). If it isn't, just ignore it */
 	sent_icmp = (struct icmp *)(ptr + 28);
 	if(sent_icmp->icmp_type != ICMP_ECHO || sent_icmp->icmp_id != pid ||
-	   sent_icmp->icmp_seq >= targets)
+	   (sent_icmp->icmp_seq >> SEQ_BITS) >= targets)
 	{
 		if(debug) printf("Packet is no response to a packet we sent\n");
 		return 0;
 	}
 
 	/* it is indeed a response for us */
-	host = table[sent_icmp->icmp_seq];
+	host = table[(sent_icmp->icmp_seq >> SEQ_BITS)];
 	if(debug) {
 		printf("Received \"%s\" from %s for ICMP ECHO sent to %s.\n",
 			   get_icmp_error_msg(p->icmp_type, p->icmp_code),
@@ -752,7 +756,7 @@
 			continue;
 		}
 
-		if(icp->icmp_type != ICMP_ECHOREPLY || icp->icmp_seq >= targets) {
+		if(icp->icmp_type != ICMP_ECHOREPLY || (icp->icmp_seq >> SEQ_BITS) >= targets) {
 			if(debug > 2) printf("not a proper ICMP_ECHOREPLY\n");
 			handle_random_icmp(icp, &resp_addr);
 			continue;
@@ -761,7 +765,7 @@
 		/* this is indeed a valid response */
 		data = (struct icmp_ping_data *)(icp->icmp_data);
 
-		host = table[icp->icmp_seq];
+		host = table[(icp->icmp_seq >> SEQ_BITS)];
 		gettimeofday(&now, &tz);
 		tdiff = get_timevaldiff(&data->stime, &now);
 
@@ -825,7 +829,7 @@
 	icp->icmp_code = 0;
 	icp->icmp_cksum = 0;
 	icp->icmp_id = pid;
-	icp->icmp_seq = host->id;
+	icp->icmp_seq = (host->id << SEQ_BITS) + ((host->icmp_cnt++) & ((1 << SEQ_BITS) - 1));
 	data = (struct icmp_ping_data *)icp->icmp_data;
 	data->ping_id = 10; /* host->icmp.icmp_sent; */
 	memcpy(&data->stime, &tv, sizeof(struct timeval));




More information about the Help mailing list