[Nagiosplug-devel] check_disk enhancements

Ton Voon ton.voon at altinity.com
Fri Jul 14 01:53:12 CEST 2006


I've spent a lot of time on check_disk and I think it is much better  
now. However, there are a few things I wanted to get some opinions on.

Firstly, I've fixed a lot of major bugs in check_disk. Turns out that  
absolute values were incorrectly measured. The docs say that ./ 
check_disk -w 100 -p / should check for 100MB free on /, but in fact  
it was checking 100 blocks (whatever the filesystem blocksize was),  
which is not the same thing at all.

Secondly, the values for space used were incorrectly calculated,  
because of changing types and then losing accuracy from floats and  
not doubles. I've copied the same techniques used in coreutils' df  
command, so the results should be exactly the same as df would output.

Thirdly, the parsing of "best match" filesystems and excluding  
filesystems was incorrect. These functions have been moved off into a  
library function where it is now tested using libtap (Haven't heard  
of it? I've only been harping on about how great it is for a year  
now! http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap and http:// 
www.onlamp.com/pub/a/onlamp/2006/01/19/libtap.html). I've also added  
in an "exact-match" option, due to public demand.

Lastly, we can now compare against multiple threshold values. Not  
just the current freespace_units, freespace_percent and  
usedinodes_percent, but also usedspace_units and usedspace_percent.  
Others can be easily added. However, there are problems with how to  
specify these thresholds (see below).

The t/check_disk.t tests have been updated as well, so some long  
standing bugs have been fixed. The only test failures at the moment  
are for range checking. Is this something that should be done  
generally? For instance, should we raise errors re: ranges where  
warning will never occur? Eg, warn if inside 0:10, critical if inside  
0:15? Or eg, percent must be between 0 and 100? I tend to think that  
it should be left to the user.

One regression that I have left in is the trimming of perf data. The  
warn/crit/max/min values were not being generated correctly, and  
there are no library routines for it yet (though there are in the  
Nagios::Plugin module). I plan on putting that back in at some stage.  
Anyone desperate for it?

The biggest problem that I've discovered is that the range  
specification for -w and -c are inverted from the norm. This was  
noticed when using the library range checking routines. check_disk -w  
10% means alert if freespace is below 10%, but we normally mean to  
alert if it is outside the range. So, for instance, check_procs -w  
1:1 means alert if greater than 1 process.

I've got a hack for check_disk (forcing a @ at the beginning of the  
range, which means to alert inside), but I was wondering if we should  
introduce a new way of defining thresholds. I'm thinking something like:

   --freespace="0:5;0:2" (warn if outside 0 to 5, crit if outside 0  
to 2)
   --usedspace_percent=";90:100" (no warn, crit if outside 90 to 100)
   --usedinode="100:;200:" (warn if outside 100 to infinity, crit if  
outside 200 to infinity)

This also matches with perfdata output.

Any opinions?


T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-plugins.org/archive/devel/attachments/20060714/41662c67/attachment.html>

More information about the Devel mailing list