[Nagiosplug-devel] RFC: New threshold syntax
ton.voon at altinity.com
Wed Mar 19 10:41:09 CET 2008
On 18 Mar 2008, at 13:41, Andreas Ericsson wrote:
> Ton Voon wrote:
>> I absolutely loathe it, as it looks far too arcane. I have no
> better suggestion though, but imagine the command line going
> something like
> --load1=^10:15/^15:19 --load10=5:8/^12:16 --load15=^10:15/19:25
> You have 5 seconds to spot the error. Anything more and the user
> debugging this line will have moved on already, thinking there are
> bugs in
> a) The plugins
> b) Nagios
> c) Both
Nope, I can't see the error, and I stared at it for 15 mins.
I agree it looks arcane. But I bet, with any other plausible way of
defining on the command line, that what you intend to do above will
not be any clearer.
The driver for this is not to invent Yet-Another-Custom-Way-Of-
Defining-Thresholds. The driver is to get a standard/consistent/
dependable way of stating what your thresholds are, with some library
functions to make it simple to add into plugins.
I can imagine some helper functions (cmdline, web pages, google
calculator) that, with more fields and example values, will tell you
if your threshold is defined "as you expect".
Or maybe do the reverse - enter in a specification like above, put in
a few load1, load10 and load15 values - and tell you which metrics the
plugin would alert on or not.
You can only do this with a standard way of defining the threshold.
> Besides, sometimes it's sensible to have the range specify the
> invalid states, and sometimes it's sensible to have them specify the
> OK range. It shouldn't be up to the arg parser to decide which one
> should be the default, and when a user sends a single number to the
> machinery, it needs to handle it properly and let the plugin decide
> for itself if it's the upper or lower boundary.
I think that's an interesting possibility - that a single digit is
contextually defined by the plugin. This breaks the conventions above
(unless the plugin gave data to say what its default behaviour is).
I've note it for future consideration.
>> Agreed. For one customer, they want to check that there is a single
>> process and alert if vsz > 100MB OR cpu > 30%. They currently have to
>> run this as 3 checks:
>> ./check_procs -u user -C command -c 1:1
>> ./check_procs -u user -C command --metric=VSZ -c 100000
>> ./check_procs -u user -C command --metric=CPU -c 30
> Ordered arguments could solve this quite easily, with a command-line
> looking like this:
> ./check_procs -u user -C command -c 1:1 --metric=VSZ -c 100000 --
> metric=CPU -c 30
I hadn't considered ordered command line options. check_disk had such
a painful way of specifying the thresholds, that I probably
subconsciously blocked that out.
Would you like to flesh out what the rules are, how backward
compatibility can be maintained, what defaults are (looks like the
default metric is "number of processes"), how the range values are
processed (1:1 looks like it is treated differently to 100000 and 30 -
contextually based on metric?).
There needs more consideration, but I think there's merit here.
> Whoever invented the --metric thing should be
[hands up] My only excuse is that the aim was to merge check_rss,
check_vsz and check_cpu into check_procs, which it has done :)
>> What ideas do you have? I see the plugins team mission is to create a
>> great set of re-usable frameworks (the C library functions and
>> Nagios::Plugin for perl) and some of the most commonly used plugins
>> that showcase the frameworks.
> Frameworks? Umm... A framework is what you create when you have a
> task to create several similar pieces of software that for some
> reason look a bit different. The plugins are by their very definition
> unique in the tasks they have to complete. Sure, they do a couple of
> things in common, such as parsing arguments and sending some data over
> a network. For the argument parsing, you have getopt() (which is
> with the plugins), for the network communication stuff you have the
> socket layer, which is so portable that even VMS has it.
OK, "framework" is a bit marketing-speak. But in your list of "common
tasks" I would add "calculation of thresholds", which is precisely
what I'm trying to do here.
> Anyways, I have no strong opinion either way, except that I feel the
> current way of specifying arguments need to be retained, since quite
> a lot of people's configurations depend on it.
Absolutely. Which is why I spent 2 hours writing tests to make sure I
get exactly the expected results using a fixed ps output and a set of
command line options: http://nagiosplug.svn.sourceforge.net/viewvc/nagiosplug/nagiosplug/trunk/plugins/tests/check_procs.t?view=log
If you'd like to add in your favourite options, patches welcome.
UK: +44 (0)870 787 9243
US: +1 866 879 9184
Fax: +44 (0)845 280 1725
More information about the Devel