[Nagiosplug-devel] RFC: New threshold syntax

Thomas Guyot-Sionnest thomas at zango.com
Fri Mar 28 19:26:26 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Matthias!

First of all thanks for the summary! It's awesome and will definitely
help to see where we should head.

Matthias Eble wrote:
|> Ton and Thomas agree that Perfdata should be in a fixed UOM and
|> not the one specified in the thresholds (at least for now).
|>    - changing the threshold UOM will destroy old graphs
|>    - Defining a base unit should be up to the respective plugins and be
|> as small as possible (sec,bytes,...)
|>    - Thus uom is optional even when no thresholds are defined (like
|> --load1 to just graph load1)
|
| When changing perfdata uom(-prefix) rrd will show up something like
| 1k MB. Taking bytes is the most precise offer we can pass to the
| graphers. They can then define how to handle/display them.

I somewhat agree, though when you set up your graph you should
divide/multiply appropriately the values. If your data is in E or p it's
probably not an option to print:
label=183000000000000000000;200000000000000000000;250000000000000000000;0;300000000000000000000
or
label=0.000000000018;0.000000000100;0.000000000200;;;
... Unless you use scientific notation.

Although I agree in most cases perfdata should have no prefix. For that
reason I'd leave the perfdata unit prefix up to each plugin and have
prefixes defined in the thresholds only apply to the thresholds
themselves and the status line.

|> Ton could imagine some helper functions (cmdline, web pages, google
|> calculator) to verify complex thresholds
|
| That could also be part of the library so every plugin could have a
| dryrun option to print which values would cause what. Based on the
| defined thresholds, (for example x:y) one could test/print what rc the
| values x,y,x+1,x-1,y+1,y-1 would cause.

+1

|  > and Andreas likes to see a
|> possibility to shorten --freespace warn=inf:300KB
|> to --freespace w=inf:300KB
|
| Me too.

+1

For the rest I'd leave that to getsubopt which, I believe, will allow
you to shorten any non-conflicting name (like -- options).

|
|> Thomas thinks about something like
|>     4) --threshold name=cpu,type=warn,min=0,max=80,inside
|
| I'd prefer to see some kind of range since it's shorter than min=,max=

I totally agree...

|> Nathan pointed out that it is more intuitive to specify only ok and
|> warning ranges.
|> Everything outside them is critical, which Ton thinks is "brilliant".
|  > ...
|> Nathan added that ':' could be replaced by '..' and using '/' as a range
|> seperator:
|>    --time=ok/0..3/seconds
|>    --freespace=ok/300..inf/KB,warn/100..300/KB
|>    --load=ok/0..2,0..1.5,0..1.2/
|
|    --freespace ok/300..inf/KB,warn/100..300/KB
| or
|    --freespace ok=300..inf/KB,warn=100..300/KB
|
| looks good to me but should we seperate prefix and uom?

First, you shouldn't mix ranges and UOM; that's the idea of subopts. As
I said gluing together the UOM and prefix is totally hopeless. so:

- --freespace ok=300..inf,warn=100..300,uom_prefix=Ki,uom=B

or (uning single argument for all thresholds):

- --threshold=metric=freespace,ok=300..inf,warn=100..300,uom_prefix=Ki,uom=B

Or (if you can abbreviate to --th):

- --th metric=freespace,ok=300..inf,warn=100..300,uom_prefix=Ki,uom=B

On a side note, I'm totally confused with the above thresholds and for
me the OK-suggestion just lost its purpose. Keep in mind that people
always used warn/crit, and it's the same in any other monitoring product
I came across.

|> --End of summary
|>
|> So to me there are multiple open questions
|>
|> Key questions:
|> - Must the threshold specification argument be valid without quoting?
|
| To me: yes (for numeric values/ranges). Required quoting opens a brider
| range of syntax though.

+1

|> - Is it necessary to allow multiple ranges per thresh warn=10:20,50:60?
|
| The Performance data definition doesn't permit this up to now but I
| could imagine some people would like to see this.

Indeed. If we ever allow that it can be done simply by allowing multiple
warn/crit ranges...

|> - Should thresholds be defined ok/warn rather than warn/crit?
|
| I like the approach but this means not only the syntax is changed.
| People need to start thinking when converting.

Yes. See my comment above. People with huge setups will fear upgrading!

|> - Should plugins only print perfdata for explicitly selected metrics
|>    or should there be a base set?
|
| I'd vote for a base set, to get some values (beside the alert ones) for
| free. Having to look what all the plugins offer is exhausting.
| I'd thus say printing as much as perfdata as possible would be best.

It's probably best to leave that to the plugins... In many cases base
set + defined special ones will probably be the best. You can always
define them with no thresholds ("print it but don't alert on it").

| Also most rrd based perfdata tools will run in severe problems with
| new/changing metric labels after creation.

We should always leave a "compatible" mode... At the very least least
for one minor release (By minor I mean 2nd number changing)

|> - Should there be an explicit range limit (10:inf over 10:)
|
| 10:inf or 10::inf looks cleaner to me.

Either way if fine for me as long as there's only one way to explicitly
define infinite ;)

|> - Is it favorable to have multiple range styles like
|>    1<x<10 *and* 1:10 *and* ... in parallel?
|
| Not if you ask me.

Noooooooo! Please save us from such a mess!!!!

Seriously if we go that way I'm 99.99999% sure one day we'll want to
drop it for some new reason, and we'll be stuck which all users who
decided to use that (or just use them because they found it in some 3rd
party doc).

If there's no *additional* functionality there's *absolutely no reasons*
to allow something else.

However you (or whoever else) is free to write whichever parser he wants
that takes even the weirdest definitions out there and convert them to
NP thresholds formats...

|> - Which component is responsible for sanity checking of thresholds?
|> - Should base8 UOM-prefixes be allowed?

If the prefix is separated from the uom, we can easily support any SI
prefixes. I'd allow this for correctness and because this is a standard.

http://physics.nist.gov/cuu/Units/prefixes.html
http://physics.nist.gov/cuu/Units/binary.html

|> I'll post my thoughts later on.
|
| I've some hints, too:
|
| Since it looks like the default alerting mechanism will be "inside",
| default range behaviour for plain numbers (X gets 0:X) should be
| reversed, too. So X will result in X:inf instead of 0:X
| Or should we drop those plain thresholds completely?

I don't really have an opinion there... Although with the ".." in ranges
you gave me an idea...

You could use ":" or ".." to differentiate between inside and outside
ranges. ":" would be outside, just as it is right now, and ".." would be
inside like in Perl.

| What about mixing uom-prefix in one range? Might this be needed in the
| future?

IMHO it's much clearer to have it separated, and it's also the original
idea of using getsubopt.

| One more thing which has been in my head for a couple of weeks, now is
| that we need to strengthen percent support in our library. This could be
| done by adding an optional(?) base value to get_status so that this can
|   calculate percentage.
|
| At the moment, my favourite threshold/range definition is following:
|    --throughput ok=1..5/M,warn=1..300/M/B
|
| Where ok takes the default UOM (here bit) and warn uses an own UOM
| (byte). But this is also invalid with our perfdata specs.

Why different prefixes for ok and warn? Can you easily tell if 18430KiB
is more or less than 18874368MiB ???? What about critical UOM? I believe
it should be consistent for easy comparison.

- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH7ThS6dZ+Kt5BchYRArysAJ9oFhJqgINhzh/lVW7uKOVn9E6ZkgCfaRyL
zCKffcnG0h/LB7dfHBZYdho=
=game
-----END PGP SIGNATURE-----




More information about the Devel mailing list