[Nagiosplug-devel] RFC: New threshold syntax

Andreas Ericsson ae at op5.se
Tue Mar 18 14:41:21 CET 2008


Ton Voon wrote:
> On 18 Mar 2008, at 04:36, Max wrote:
> 
>> I think your idea to extend the syntax is a good one :), I personally
>> have found myself more and more using a syntax in this format
>>
>> -w 'metric<op>number:metric2<op>number'
>>
>> e.g.
>>
>> -w '1min>15:5min>5' -c '15min>15:5min>10'
>>
>> "Warn if 1 minute load is greater than 15 or 5 minute is greater than
>> 5, critical if 15 minute is greater than 15 or 5 minute is greater
>> than 10'
>>
>> this lets a user specify a fairly complex or'd syntax for complex
>> thresholds .. the separator could determine or vs. and.
> 
> So I guess in the proposed format, this would be:
> 
> --load1=/15: --load5=10:/5: --load15=15:
> 
> Is that easier to read? I guess for me it is :)
> 

I absolutely loathe it, as it looks far too arcane. I have no
better suggestion though, but imagine the command line going
something like

--load1=^10:15/^15:19 --load10=5:8/^12:16 --load15=^10:15/19:25

You have 5 seconds to spot the error. Anything more and the user
debugging this line will have moved on already, thinking there are
bugs in
a) The plugins
b) Nagios
c) Both

(no, it's not in the totally insane but perfectly legal thresholds).

Besides, sometimes it's sensible to have the range specify the
invalid states, and sometimes it's sensible to have them specify the
OK range. It shouldn't be up to the arg parser to decide which one
should be the default, and when a user sends a single number to the
machinery, it needs to handle it properly and let the plugin decide
for itself if it's the upper or lower boundary.

> I like the > symbol, but it is overloading an existing shell meaning,  
> which is why I avoided it.
> 

Sensible. With '>' in arguments every argument needs to be escaped,
which is just plain stupid.

> Currently, my proposal only supports an OR of the threshold checks,  
> but I guess we could easily add a flag to change to AND instead.
> 

Why? I imagine OR would be more useful, or perhaps 'OR' and 'AND' at
the same time. Perhaps you want either of 5 thresholds to match but
only if not this other one matches too, since in that case you want
only *this* or *that* to trigger an alert. Perhaps it'd be easier to
just rip an SQL-parsing implementation directly, with subquery support
tucked right in.

Or perhaps 5 triggered warnings from a single plugin should escalate
to a critical? I'd imagine quite a lot of users would want that.

>> I too find that just using simple single numbers does not do well for
>> 'check all' types of plugins .. plugins that check multiple metrics at
>> once, it is a limitation that forces inefficiencies when adding
>> service checks for Nagios .. if an element type, for example, CPU
>> utilization, has 3 metrics associated with it (1 min, 5 min, 15 min),
>> I want to check those all at once with one plugin, not have Nagios
>> make 3 calls to the same plugin just to check all 3 ...
> 
> Agreed. For one customer, they want to check that there is a single  
> process and alert if vsz > 100MB OR cpu > 30%. They currently have to  
> run this as 3 checks:
> 
>    ./check_procs -u user -C command -c 1:1
>    ./check_procs -u user -C command --metric=VSZ -c 100000
>    ./check_procs -u user -C command --metric=CPU -c 30
> 

Ordered arguments could solve this quite easily, with a command-line
looking like this:

./check_procs -u user -C command -c 1:1 --metric=VSZ -c 100000 --metric=CPU -c 30

although that's 11 chars longer than what you have below.

> So I'm trying to get check_procs to allow the following:
> 
> ./check_procs -u user -C command --number=^1:1 --vsz=100000 --cpu=30
> 

You could achieve the exact same thing by just adding support for the
new long arguments instead. Whoever invented the --metric thing should be
shot.

Wrappers are good things for accomplishing complex tasks with simple
tools. One of the things that has given Nagios such a great spread is
that it's really, really simple to write plugins for it. Creating a
new check is as easy as hacking up 5 lines of shell-script, or 40
lines of C, or whatever.

Taking the simple tools and making them more complex excludes a lot
of nice usage one could get out of them.

>> I really like that plugins have guidelines and that Nagios takes a
>> keep things simple approach but I also think we need as a community to
>> have guidelines that allow developers to create efficient checks, more
>> and more we see 'check all' types of checks that could really benefit
>> from specifying multiple metric thresholds per warning and critical
>> ranges ..
> 
> What ideas do you have? I see the plugins team mission is to create a  
> great set of re-usable frameworks (the C library functions and  
> Nagios::Plugin for perl) and some of the most commonly used plugins  
> that showcase the frameworks.
> 

Frameworks? Umm... A framework is what you create when you have a
task to create several similar pieces of software that for some
reason look a bit different. The plugins are by their very definition
unique in the tasks they have to complete. Sure, they do a couple of
things in common, such as parsing arguments and sending some data over
a network. For the argument parsing, you have getopt() (which is shipped
with the plugins), for the network communication stuff you have the bsd
socket layer, which is so portable that even VMS has it.

Anyways, I have no strong opinion either way, except that I feel the
current way of specifying arguments need to be retained, since quite
a lot of people's configurations depend on it.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231




More information about the Devel mailing list