[Nagiosplug-devel] RFC: New threshold syntax

Ton Voon ton.voon at altinity.com
Wed Mar 19 10:41:09 CET 2008


On 18 Mar 2008, at 13:41, Andreas Ericsson wrote:

> Ton Voon wrote:
>>
>> I absolutely loathe it, as it looks far too arcane. I have no
> better suggestion though, but imagine the command line going
> something like
>
> --load1=^10:15/^15:19 --load10=5:8/^12:16 --load15=^10:15/19:25
>
> You have 5 seconds to spot the error. Anything more and the user
> debugging this line will have moved on already, thinking there are
> bugs in
> a) The plugins
> b) Nagios
> c) Both

Nope, I can't see the error, and I stared at it for 15 mins.

I agree it looks arcane. But I bet, with any other plausible way of  
defining on the command line, that what you intend to do above will  
not be any clearer.

The driver for this is not to invent Yet-Another-Custom-Way-Of- 
Defining-Thresholds. The driver is to get a standard/consistent/ 
dependable way of stating what your thresholds are, with some library  
functions to make it simple to add into plugins.

I can imagine some helper functions (cmdline, web pages, google  
calculator) that, with more fields and example values, will tell you  
if your threshold is defined "as you expect".

Or maybe do the reverse - enter in a specification like above, put in  
a few load1, load10 and load15 values - and tell you which metrics the  
plugin would alert on or not.

You can only do this with a standard way of defining the threshold.

> Besides, sometimes it's sensible to have the range specify the
> invalid states, and sometimes it's sensible to have them specify the
> OK range. It shouldn't be up to the arg parser to decide which one
> should be the default, and when a user sends a single number to the
> machinery, it needs to handle it properly and let the plugin decide
> for itself if it's the upper or lower boundary.

I think that's an interesting possibility - that a single digit is  
contextually defined by the plugin. This breaks the conventions above  
(unless the plugin gave data to say what its default behaviour is).  
I've note it for future consideration.


>> Agreed. For one customer, they want to check that there is a single
>> process and alert if vsz > 100MB OR cpu > 30%. They currently have to
>> run this as 3 checks:
>>
>>   ./check_procs -u user -C command -c 1:1
>>   ./check_procs -u user -C command --metric=VSZ -c 100000
>>   ./check_procs -u user -C command --metric=CPU -c 30
>>
>
> Ordered arguments could solve this quite easily, with a command-line
> looking like this:
>
> ./check_procs -u user -C command -c 1:1 --metric=VSZ -c 100000 -- 
> metric=CPU -c 30

I hadn't considered ordered command line options. check_disk had such  
a painful way of specifying the thresholds, that I probably  
subconsciously blocked that out.

Would you like to flesh out what the rules are, how backward  
compatibility can be maintained, what defaults are (looks like the  
default metric is "number of processes"), how the range values are  
processed (1:1 looks like it is treated differently to 100000 and 30 -  
contextually based on metric?).

There needs more consideration, but I think there's merit here.

> Whoever invented the --metric thing should be
> shot.

[hands up] My only excuse is that the aim was to merge check_rss,  
check_vsz and check_cpu into check_procs, which it has done :)


>
>> What ideas do you have? I see the plugins team mission is to create a
>> great set of re-usable frameworks (the C library functions and
>> Nagios::Plugin for perl) and some of the most commonly used plugins
>> that showcase the frameworks.
>>
>
> Frameworks? Umm... A framework is what you create when you have a
> task to create several similar pieces of software that for some
> reason look a bit different. The plugins are by their very definition
> unique in the tasks they have to complete. Sure, they do a couple of
> things in common, such as parsing arguments and sending some data over
> a network. For the argument parsing, you have getopt() (which is  
> shipped
> with the plugins), for the network communication stuff you have the  
> bsd
> socket layer, which is so portable that even VMS has it.

OK, "framework" is a bit marketing-speak. But in your list of "common  
tasks" I would add "calculation of thresholds", which is precisely  
what I'm trying to do here.

> Anyways, I have no strong opinion either way, except that I feel the
> current way of specifying arguments need to be retained, since quite
> a lot of people's configurations depend on it.

Absolutely. Which is why I spent 2 hours writing tests to make sure I  
get exactly the expected results using a fixed ps output and a set of  
command line options: http://nagiosplug.svn.sourceforge.net/viewvc/nagiosplug/nagiosplug/trunk/plugins/tests/check_procs.t?view=log

If you'd like to add in your favourite options, patches welcome.

Ton

http://www.altinity.com
UK: +44 (0)870 787 9243
US: +1 866 879 9184
Fax: +44 (0)845 280 1725
Skype: tonvoon





More information about the Devel mailing list