[Nagiosplug-devel] RFC: New threshold syntax

Thomas Guyot-Sionnest dermoth at aei.ca
Sun Apr 6 05:27:37 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/04/08 10:23 AM, William Leibzon wrote:
>> On Mon, Mar 17, 2008 at 2:15 PM, Ton Voon <ton.voon at altinity.com
> <mailto:ton.voon at altinity.com>> wrote:
>> Hi!
>>
>> I tried to get this through last year, but I don't think a conclusion
>> was reached so I'm going to try again now!
>>
>> This is a proposal for a new threshold syntax. My motivation is that I
>> have to update check_procs based on a customer's requested use for it.
>> I'd like a generally applicable syntax, so that there is maximum code
>> reuse and consistency.
>>
>> The proposal is here: http://nagiosplugins.org/rfc/new_threshold_syntax
>>
>> I've decided to use the website as this can be the master document. I
>> plan on updating it based on people's comments. Hopefully it will not
>> require too many alterations!
>>
>> Comments?
> 
> I'm going to comment on it even if its a bit late... And quickly my
> personal "vote" would be NO, but I have mostly outsider perspective and
> NO does not mean I'm like existing threshold either and have not used it
> when developing my plugins until very recently. I probably will not
> participate in this thread as I have only limited time right now but
> will answer specific questions to me if there are any.
> 
> First of I'd like to note that above proposal moves to the idea that
> each threshold is a separate parameter checked by plugin for which
> critical/warning/ok are to be specified. For such a case the syntax
> could work if the parameter to the plugin is stable and so can have its
> own command line parameter. But this is not all so in every case. It is
> especially not so for plugins that I write where parameters are called
> attributes and are specified on command line to the plugin and their
> numbers change depending on use. In such and similar cases it is a lot
> better to have --warning and --critical command-line parameter and spec
> on how to specify it. This is also what people are used to who used

I have to disagree with this, although it's an interesting point that
could be taken into account when writing the new lib. There's nothing
that prevent us to let the plugin receive unparsed threholds and deal
with them, or provide a way for on-demand thresholds. So you could have
"metric=var:VAR1,..." and then the plugin yould use what'S after the
first colon as the custom variable you're checking against.

> nagios for many years. Additionally in new syntax you now have "ok",
> "warn", "crit" and that I think would end up being confusing (and make
> code more difficult) especially if people start specifying all 3

You don't have to specify all, so it's up to you to decide what you want
to so. ok-range has been thought as a special range that takes
precedence over the others so it makes an easy way to specifi only the
accepted range. IMHO it make things easier for users.

Regarding the code, it should also be just as simple as before. We're
also thinking about providing a C library (and obviously include them in
N::P).

> together. I also think that people do not like it when you change plugin
> spec code too much/too often and with nagios plugins you want to be as
> backward-compatible from one release of the plugin to another (admins
> expect to be able to just drop in your new version of code as
> replacement for what they currently have and then they may look at use
> of new features and further changes). You maybe able to support both as
> you'd specific new syntax as "crit=.." where as old one did not have but
> supporting multiple syntax versions in same code makes it larger and
> more difficult to maintain.

It will be backward compatible. We want to make this transition as
seamless as possible and therefore we'll make sure that in the next
minor release (which doesn't happen that ofter btw - we're at out first
major and last minor change was a few years ago).

The new thresholds will be a huge enhancement and will allow some
plugins that still doesn't use the official threshold formats due to
limitation to finally provide them. It will also be much easier to add
all plugin metrics as real thresholds (many only allow criticals or
warnings) without having many new and confusing switches. Any new metric
added will not clobber the current option section of --help which is, in
some plugins, very long and disorganized. Finally it will open the door
for multi-range thresholds.

> Now I'm not saying you should not do it and extend your current syntax
> if there is a legitimate need and problems reported, just be warned
> about consequences and difficulties... I also generally found you
> "range" based syntax to be somewhat counter-intuitive and as far as a:b
> specifying alert when OUTSIDE of the range. Perhaps this can be fixed
> fairly easily by adapting new separator

We're fully warned, and we'll do our best to use the community feedback
to make them as intuitive and simple as possible while getting rid of
previous limitations and opening new possibilities. This task is as hard
as my last sentence was hard to read, but this is something that must
get done.

> (a..b is alert when >a and <b as proposed instead of @a:b) and having
> single value (no 'a:b" just a) be understood as single upper threshold
> to be compatible with how existing people have used nagios for years.

Compatibility will be held with the original switches. There is no
reason to make the specs harder for backward compatibility as the
command line will have to be changed to use new thresholds anyways. On
the other hand, we willll definitely look into helper tools, switch or
something else to get new thresholds out of legacy ones.

 > Now I'll give personal perspective of what I have done in my plugins and
> custom spec I use (which I do not recommend you adapt but seeing it may
> help in some way). In most of my plugins there are many "attributes"
> that are checked as for example in:
>  http://william.leibzon.org/nagios/plugins/check_mysqld.pl
> where attributes are variables from "SHOW STATUS" and you specify which
> ones to check (similar also for number of my other plugins at
> http://william.leibzon.org/nagios such as check_snmp_temperature,
> check_jboss and others; although code is older as far as thresholds
> parsing in them but that will slowly get updated). So I adapted system
> where list of Attributes are specified with one command line parameter
> (usually -a and then alphanumeric list separated by ','), warning
> threshold values with another (usually --warn with threshold spec values
> separated with ',' - empty/no value if there is no threshold) and
> similarly critical. This turns to work pretty well although you can
> argue that there are better ways to do it... I also wrote code that
> handles threshold spec in my plugins (which is largely same for every
> one) before I even knew there is "official threshold spec". So doing it
> independently I created threshold syntax of [prefix]value where prefix
> is one of "<",">",'=","!" specifying if alert (critical or warning
> depending on where that is in) is to be issued when plugin data is
> below, above or equal or not equal to value specified. Of the people I
> know that use my plugins they find it all fairly straight forward but
> the symbols are a bit of an issue because > and < are special symbols in
> bash, etc. so result is often enough that for proper call to plugin you
> have to quote it all in "" to work. At some point I came upon page about
> nagios plugins threshold syntax (can't immediately remember url) and
> while I found that syntax somewhat odd, when it came down to extending
> my syntax to support ranges (which are really very really needed) I
> decided to use that to be compatible with what others do and to support
> users who are familiar with that spec. So currently this is a full spec
> documented in plugin code:
> 
> # Warning and critical thresholds are specified with '-w' and '-c' and each
> # one must have exact same number of values to be checked (separated by ',')
> # as number of variables specified with '-a'. Any values you dont want to
> # compare you specify as ~ (or just not specify a value, i.e. ',,').
> #
> # There are also number of other one-letter modifiers that can be used
> # as prefix before actual data value to direct how data is to be checked.
> # These prefixes are as follows:
> # > : issue alert if data is above this value (default for numeric value)
> # < : issue alert if data is below this value (must be followed by number)
> # = : issue alert if data is equal to this value (default for non-numeric)
> # ! : issue alert if data is NOT equal to this value
> #
> # Additionally supported are two specifications of range formats:
> # number1:number2 issue alert if data is OUTSIDE of range [number1..number2]
> # i.e. alert if data<$number1 or data>$number2
> # @number1:number2 issue alert if data is WITHIN range [number1..number2]
> # i.e. alert if data>=$number and $data<=$number2

Care was taken to provide as much capabilities without having to escape
the threshold definition. User questions regarding improperly escaped
characters if something that can be seen on a regular basis on all the
public Nagios/nagios-plugins mailing lists. This therefore was a big
priority in defining the format.

> Well ok, going back to your new spec question and again taking personal
> perspective, I can tell that if plugin syntax seriously changes, I
> probably would not be willing to rewrite my code (although adding
> support for ".." to work same as @a:b is easy enough and that I might do
> that) and I also have no near-term plans to start using Nagios::Plugins
> library either (I think its bloated and difficult but that is really
> issue for another discussion which I do not want to start here).

The ides is that you shouldn't have to do much in your plugin. You'll
definitely won't have to deal with threshold definition characters but
rather with threshold objects.

- -

On a side note, I had a quick look at your plugin and there was
something in the TODO about retaining data... There's a much better way
to reuse last check data than using a file. Just write the last unix
timestamp and your data in the perfdata line, and then retrieve it on
the next check by passing it on the command line.

I wrote a plugin that does it (not released yet). I just pasted the code
in a pastebin... It reads CSV performance logs written by Windows. It
might look a bit complex for what it does but it's based on a cacti
script (there are no embedded Perl in cacti) and therefore was coded for
optimal performance no matter how large the log file is.

http://nagios.pastebin.com/m352a3dd8

getlog.pl (I can send it too if you want) doesn't use a single Perl
module, although I activate warnings and strict for development.

Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH+EMp6dZ+Kt5BchYRAmW5AJ9L+qXrjWqO9saP4H4dMw7F6ImFlgCdExuo
rlNA9miWqyf5qKllya6GFio=
=oQQl
-----END PGP SIGNATURE-----




More information about the Devel mailing list