[Nagiosplug-devel] check_disk enhancements

John P. Rouillard rouilj at cs.umb.edu
Sun Jul 16 05:07:34 CEST 2006


In message <20060715235551.GA25772 at openfusion.com.au>,
Gavin Carr writes:
>Your changes look great. The missing feature I'd most like check_disk is 
>a flag to change reporting (and perhaps also thresholds, but that's less 
>important) from being in terms of free space to being in terms of 
>utilisation, so the percentages would be like 'df'.
>
>For some reason, my brain finds scanning utilisation percentages much
>easier than free space ones - contrast:
[...]
>Anyone else wired this way?

Basically, yes.

In message <C35C4EDC-2D0E-47B1-85AF-A5311CEAEF92 at altinity.com>,
Ton Voon writes:
>The biggest problem that I've discovered is that the range  
>specification for -w and -c are inverted from the norm. This was  
>noticed when using the library range checking routines. check_disk -w  
>10% means alert if freespace is below 10%, but we normally mean to  
>alert if it is outside the range. So, for instance, check_procs -w  
>1:1 means alert if greater than 1 process.

Specifying the freespace always seemed weird to me. If we defined the 
used space, it would work better with the -w and -c settings.

  -w 80 (-w 0:80) - warn if more than 80% of the disk used. 
  -c 90 (-c 0:90) - critical if more than 90% of the disk used.

However this would be an incompatible change to the command line that
doesn't look different from the pre-existing calling format, so it's
out unless we implement a flag to request this as Gavin said above.

>I've got a hack for check_disk (forcing a @ at the beginning of the  
>range, which means to alert inside), but I was wondering if we should  
>introduce a new way of defining thresholds. I'm thinking something like:
>
>   --freespace="0:5;0:2" (warn if outside 0 to 5, crit if outside 0  
>to 2)
>   --usedspace_percent=";90:100" (no warn, crit if outside 90 to 100)
>   --usedinode="100:;200:" (warn if outside 100 to infinity, crit if  
>               outside 200 to infinity)
>
>This also matches with perfdata output.

Just a nit first, would the new way be in addition to the old way (-w,
-c), or replace the old way entirely and report an error if somebody
tries to use it? I think in addition to is the best for backwards
compatibility.

The -w and -c flags work well if the plugin is only testing for one
parameter. However a lot of plugins test for multiple parameters. I
have a couple of home grown plugins that test 10 different parameters
because the overhead of getting the data is so large that calling the
program 10 times to just extract a single data item is nuts.

In other cases there can be multiple tests to perform against the data
from the command and they must all be done at once because the data
needs to be synchronized for the tests to be meaningful. Using
tkwatcher <http://www.cs.umb.edu/~rouilj/tkwatcher/> I had some
instances where there were 30 tests on the output of a single command
stream.  I agree that the current -w -c -W -C threshold setting
mechanism's don't cut it. So I think something like what you propose
is needed. I would extend it just a bit however to allow each
threshold to specify:

    warn_list;crit_list

where warn/crit_list is:

  warn_list/crit_list  range|single[,range|single]

where single is a degenerate form of range implying 0:single just as
with the current plugins. This way we can support upper and lower
warning limits. E.G: warn if in the range 10-20 or 80-90, crit if in
the range 0-10 or 90-100:

  --freespace 0:10,20:80,90:100;10:90

This would also work for those cases where we need to exclude the
middle of a range e.g. when checking discrete values from
snmp. E.G. 1,2,4 are warning but 3,5 are critical:

  --thresh 3:3,5:;1:2,4:4

Quips, comments, evasions, questions, answers or suggestions
welcome. Although I have to say coding my standard parser for shell
script to deal with the current threshold processing was a bear. This
enhanced form may be worse.

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.




More information about the Devel mailing list