[Nagiosplug-devel] RFC: Performance data guidelines

Voon, Ton Ton.Voon at egg.com
Thu Jul 10 07:28:16 CEST 2003


Peter,

Thanks for your reply. 

I like the idea of quoting the attributes/values, but I don't think they
will be necessary if we get the standard attributes and their values right. 

I think perfdata should be space separated data (just to save processing),
but I'm happy to take a consensus. Comma separated may make it a bit easier
to parse visually. Any other opinions?

Based on my guidelines, an example output of check_ping would be:

PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96

Three things that spring to mind:
- it's a bit shorter!
- time means something different from check_http, check_tcp, etc. Those mean
"time taken to do a check". For check_ping, it would mean average time for a
packet
- pct is at 0, which is a "good" result (0% packet loss). However -
according to my proposal - check_disk would return pct=5 for 5% free on
total disk, which, as it gets closer to 0%, would be "bad". Maybe it should
be reversed, so pct=100% to mean no packet loss - should 0% always be
considered the worst case? This may not be easy for "number" attributes.

As you can see, it is hard to standardise on what the values actually tell
you. This is what I meant by "Why the returned values are bad is then up to
interpretation (and that is the key to any performance analysis!)". However,
what the guidelines will do is allow the RRD generation to happen easier.

Ton

> -----Original Message-----
> From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] 
> Sent: Tuesday, July 08, 2003 2:36 PM
> To: Voon, Ton
> Cc: nagiosplug-devel at lists.sourceforge.net
> Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines
> 
> 
> Ton,
> 
> We are in the process of developing a plugin to check information
> collected by another datacollection system. Based on the 'Performance
> Data' chapter in the Nagios documentation, we decided on 
> comma-separated
> 'name=value' pairs. As we want to be able to transparently support the
> names and values used by the other system, both the name and the value
> part can optionally be quoted (with either single or double 
> quotes). The
> result is:
> 
> 	Plugin Output|name1=value1, 'name 2'=value2, name3='11"',
> name4="Peter's PC"
> 
> To check our procedures for processing the performance data, I also
> modified the check_ping plugin. It now reports:
> 
> 	PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0%
> RTA="1.96 ms"
> 
> The problem we are facing with this format is indeed the 
> interpretation
> by RRD (or in our case the script that's feeding RRD), so we are open
> for suggestions. Your proposed guideline at least seems to 
> help us find
> the right direction.
> 
> Peter.
> 
> 
> -----Original Message-----
> From: Voon, Ton [mailto:Ton.Voon at egg.com] 
> Sent: dinsdag 8 juli 2003 12:58
> To: 'nagiosplug-devel at lists.sourceforge.net'
> Subject: [Nagiosplug-devel] RFC: Performance data guidelines
> 
> 
> Hi!
> 
> One of the features required for 1.4 is performance data. I would like
> to write up the guidelines for this, but wanted confirmation 
> if this is
> the right way to go, so any comments would be appreciated.
> 
> I think perf data should have/be:
> 
> - short labels
> - generic and common labels across plugins if possible
> - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+
> - redundant data removed (eg, if check_disk returns pct and number
> (free), can calculate used bytes)
> 
> My suggestion for labels are:
> 
> Name ; Units ; printf format ; Details
> time ; seconds ; %.3f ; time taken to do a specific check (eg 
> DNS query,
> HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather
> than used if applicable) (eg total disk, total swap, ping 
> percent loss)
> number ; must be bytes if applicable ; %d ; a given number of things
> (free rather than used if applicable) (eg processes, users, bytes used
> such as total disk or total swap) numberf ; float ; %.3f ; a given
> number of things that may be fractional (eg, load average, 
> average bytes
> transmitted) counter ; a continuous counter (must be bytes if
> applicable) ; %d ; a continuous counter (eg bytes transmitted on an
> interface) load1 ; load ; %.2f ; load average over 1 min 
> load5 ; load ;
> %.2f ; load average over 5 min load15 ; load ; %.2f ; load 
> average over
> 15 min
> 
> Contentious points:
> - loadx. Not really keen on these, but don't seem to fit into 
> any other
> labels, unless we only return load5 and use numberf
> - taking free values rather than used. This is consistent with the
> output for check_disk and check_swap. Looking at graphs, I guess you
> want to see it nearer zero which is your definite limit, rather than
> continuously increasing
> - maybe numberf is not required, but we say that number could be
> fractional. I think this maybe better as RRD doesn't care 
> whether values
> are integers or not
> - too reductionalist? Would you prefer labels that describe 
> the measure?
> I think the labels should be generic and the plugin describes the
> context
> 
> As an example, the patches submitted on SF for check_ping had perf
> labels of rta and loss, but I think these should be time and pct
> respectively. I think this makes it easier for something like RRD to
> work out what type of value it is to draw the graphs. Why the returned
> values are bad is then up to interpretation (and that is the 
> key to any
> performance analysis!).
> 
> Ton
> 


This private and confidential e-mail has been sent to you by Egg.
The Egg group of companies includes Egg Banking plc
(registered no. 2999842), Egg Financial Products Ltd (registered
no. 3319027) and Egg Investments Ltd (registered no. 3403963) which
carries out investment business on behalf of Egg and is regulated
by the Financial Services Authority.  
Registered in England and Wales. Registered offices: 1 Waterhouse Square,
138-142 Holborn, London EC1N 2NA.
If you are not the intended recipient of this e-mail and have
received it in error, please notify the sender by replying with
'received in error' as the subject and then delete it from your
mailbox.





More information about the Devel mailing list