[Nagiosplug-devel] Re: Suggested alterations to the Performance Protocol (Re: Nagiosplug-devel digest, Vol 1 #653 - 5 msgs)

Yves Mettier ymettier at libertysurf.fr
Wed Sep 8 01:28:11 CEST 2004


I read only a digest, so I may miss some messsages. And I probably break the thread.
Sorry for that.

> Today's Topics:
[...]
>    5. Re: Suggested alterations to the Performance Protocol (Ton Voon)

> Message: 5
> Cc: nagiosplug-devel at lists.sourceforge.net
> From: Ton Voon <tonvoon at mac.com>
> Subject: Re: [Nagiosplug-devel] Suggested alterations to the Performance Protocol
> Date: Tue, 7 Sep 2004 21:23:19 +0100
> To: Ben Clewett <Ben at clewett.org.uk>

>> 2.
>>
>> Suggested by Yves Mettier:  The addition of a special reserved
>> variable, 'check_time' which records the time at which the plugin
>> completed the check.
>>
>> I can't remember if units were suggested, but in line with Nagios, the
>> time as seconds from 01-01-1970 00:00:00 UTC, or standard UNIX time,
>> may make sense.  If Yves is reading this, he may be able to comment
>> further.
>
> Firstly, why is this performance data?
>
> Secondly, I think something like this should be done by Nagios, not the
> plugins. Seems a bit of a waste to code in start/stop times in each
> plugin when the core execution engine would hold all this information.
> There needs to be a change to Nagios to pass this data through somehow,
> but then this would work for every plugin.
>
> (I think "time", which is really "elapsed time", is slightly different
> as this will remove timings from things that are outside of the core
> check, so for example check_dns gives the time for the dns lookup, but
> removes plugin startup, variable parsing, host resolution checks, etc)
>
> However, I like the idea of "special reserved variables" - I think it
> is worthwhile to add a table with a list of common labels, such as
> "time". Any comments?

Maybe some explanations.
We have some plugin here (don't ask me what's inside : I have no idea:) that read some
log and transform it to perf data when this can be done.
For this reason, when the plugin runs, we get the date from nagios, not the date of when
the events occured. So the need used to be to add some timestamp for every perf data.
Now, we are thinking about something else. The log is nothing else than another
serviceperf.log file, with another format. We just need to parse that log file and
output some new log file with a nagios compatible format. Then use the usual tools to
parse that file.

Is this still a need ? I think that those who have the same need should think, like me,
if nagios is the good tool to output perf data.

I have no opinion about a table with a list of common labels. Maybe we should "register"
some labels like "time" when needed, with some good explanation to the spec maintainer ?

>
>> 3.
>>
>> The addition of macro's to define special numbers.  Some mentioned are
>> NULL to indicate no value or an invalid value.  INF and -INF to
>> indicate an infinite value.  Possibly NAN to represent Not a Number,
>> as with division by zero.  Not often used, but do have a place.
>
> This is already covered in
> http://nagiosplug.sourceforge.net/developer-
> guidelines.html#THRESHOLDFORMAT but is not specifically mentioned for
> the perf data output. This should be clearer.
>
> I like the idea of macros. I had proposed using some arcane characters
> (such as ~ for negative infinity), but I think your macro idea is far
> clearer. Any comments?

~ already exists, but only for -INF.
I'm working on some parser that understand ~ as +INF when it reads only ~ or something
like value:~ (which is not authorized).

I like the current spec but the doc should be made clearer. I don't think we need -INF
and +INF macros (harder to parse than ~)

Here is what I suggest:
[n][start:][end]
The default is:
n = ' '
start = '0.'
end = '~'

Authorized values:
n : ' ' or '@'
start : ~ or any float value
end : ~ or any float value

Notice that we don't need a sign for infinite. It is - for start, and + for end. I
cannot imagine a situation where start is +inf or end is -inf :)

Examples :
1 -> range is inside; start = 0., end = 1.
@1 -> range is outside; start = 0., end = 1.
1: -> range is inside; start = 1., end = +infinite
1:~ -> range is inside; start = 1., end = +infinite
@~ -> range is outside; start = 0., end = +infinite
@~: -> range is outside; start = -infinite, end = +infinite
~:~ -> range is inside; start = -infinite, end = +infinite

Well, the only addition to the actual specification is that ~ is an authorized value for
end.

>> 4.
>>
>> To allow any UOM unit.  For instance, 'degc' for temperature, 'users'
>> for a user count etc.
>
> I think degc makes sense (is there a formal SI unit for degrees
> centigrade?), but users doesn't - users is already covered in point 10a
> at http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN185.
> For example, "active_users=10" would be sufficient without a UOM, but
> "cabinet_temperature=20" could be in degrees centigrade or degrees
> Fahrenheit.
>
> The idea was that the label was free text to describe the thing being
> measured, while the UOM gives the graphing program enough data on how
> to graph (eg, RRD has a concept of graphing the difference between two
> values for counters type data). Thus having an exhaustive list of UOM
> units would make it extra coding. But there does seem to be confusion
> as things like B (bytes) and s (seconds) are UOMs whereas it wouldn't
> matter to the graphing program. Maybe we should be more like SI units?

I suggest that we copy SI units. If there is any doubt, the SI units are the reference.
For units that don't exist yet, we can invent some.
If a new SI unit appear and we have invented one with the same name, we keep the SI unit.
I think it is better to make them the reference : users will have no surprises, or they
will have only when the SI unit system was not followed :)

>> 5.
>>
>> There is no way of representing a date.  There may be some plugins,
>> eg, recording user information, which do want to record a date.
>>
>> I have suggested UNIX time above.  However another suggestion is to
>> use the popular SQL syntax: '%Y-%m-%q %d:%M:%S.ms', eg, '2004-09-07
>> 16:10:15.123'.  Or a component of 'date', 'data time', 'time.ms'.  It
>> works for SQL :)
>
> I would prefer to use Unix time, only because of brevity. As long as it
> gets translated later (and there are lots of common functions for it),
> then the graphing would be okay.

I also prefer time_t values : fast to use, fast to parse... Use strftime to translate to
something more human :)
For more human output, please read the ISO-8601 spec. The syntax is not %Y-%m-%q
%d:%M:%S but %Y-%m-%qT%d:%M:%S. Don't ask me why the separator between the date and the
hour is T : I don't know :)

I prefer the time_t format.

> Would Unix time with a .ms make sense for more granularity? This would
> presumably need a UOM defined too.

For use with nagios, the answer is no. Nagios cannot be trusted on the time when the
plugin gets the result and when nagios TIMET macro is set. Execute this and you will
understand what I mean:

#!/some/interpreter
sleep(random())
v = get_value()
sleep(random())
print "well | value=v"
exit 0

How can nagios know when get_value was executed ?

However...
-> .ms can be put for tools other than nagios that need that granularity
-> .ms can be put is the plugin puts the timestamp itself in the perf data (see
suggestion 2 above, at the beginning of my message)

If you put .ms, I suggest the following format : seconds.ms
right now, this would be 1091435023.123


Yves

-- 
- Homepage    - http://ymettier.free.fr - http://www.logicacmg.com -
- GPG key     - http://ymettier.free.fr/gpg.txt                    -
- Maitretarot - http://www.nongnu.org/maitretarot/                 -
- GTKtalog    - http://www.nongnu.org/gtktalog/                    -







More information about the Devel mailing list