[Nagiosplug-devel] RFC: Performance data guidelines

Voon, Ton Ton.Voon at egg.com
Tue Jul 15 06:31:03 CEST 2003


Kjell,

Firstly, just want to say thank you for your contribution. This is a
fascinating thread. I much rather have this discussion now than it raised as
design problems afterwards!

Yeah, I thought afterwards that check_disk has to be different as a
summation does not really tell you anything useful. My preference is that
the the output reflects the filesystem, not the device, but we can use a
switch for that.

I think the : sepearated fields instead of crit,warn,critp,warnp is better
too - the new check_disk allows different thresholds per disk, so this fits
in well. However, some questions pop up:

1) I don't like the min and max values. I think that information is held
with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
assume any value.
2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
could be: 'label=value[UOM][:critical:warning[:critical:warning]]'
3) what about "critical at 10%, but no warning levels"? Can just use a null,
I guess.
4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
processes. Is this doable at all? If so, would we need to change the
separators?

Ton

> -----Original Message-----
> From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no] 
> Sent: Saturday, July 12, 2003 5:41 PM
> To: NagiosPlug Devel
> Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines
> 
> 
> 
> I really like the idea of including the critical and warning 
> level together
> with max and min values in the performance data, but let me propose an
> alternative layout  based on colon (:) separated fields :
> 
> - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]'
> comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces 
> allowed, but
> not recommended)
> - values, critical, warning, max, min in class [-0-9.]. No spaces.
> - critical and warning is the thresholds for this measurement
> - max and min is the maximum/minimum value for the measurement
> 
> It think this is easier to parse than the proposal from Ton based on
> 'magical' words.
> 
> Example
> 
> Disk space
> DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on
> /dev/hda2] [252600 kB (100%) free on 
> /dev/shm]|_dev_hda3=84%:10:25:100:0,
> _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0
> 
> For disk space and other plugins where the UOM is defined 
> when the plugin
> is called, use the active OUM as the value for the 
> performance data. Notice
> how the / is replaced with _ to ensure a valid RRD datasource 
> name. It is
> necessary to show the performance data for each disk in a 
> disk set, not
> only for the total as Ton proposes.
> 
> PING
> 
> PING OK - Packet loss = 0%, RTA =
> 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0
> 
> The empty max value for RTA is understood as undefined.
> 
> 
> 
> Kjell Sundtjønn
> 
> 
> 
> |---------+-------------------------------------------->
> |         |           "Voon, Ton" <Ton.Voon at egg.com>   |
> |         |           Sent by:                         |
> |         |           nagiosplug-devel-admin at lists.sour|
> |         |           ceforge.net                      |
> |         |                                            |
> |         |                                            |
> |         |           11.07.2003 16:10                 |
> |         |                                            |
> |---------+-------------------------------------------->
>   
> >-------------------------------------------------------------
> ---------------------------------|
>   |                                                           
>                                    |
>   |       To:       NagiosPlug Devel 
> <nagiosplug-devel at lists.sourceforge.net>                    |
>   |       cc:                                                 
>                                    |
>   |       Subject:  RE: [Nagiosplug-devel] RFC: Performance 
> data guidelines                      |
>   
> >-------------------------------------------------------------
> ---------------------------------|
> 
> 
> 
> 
> I'm starting to side with Kjell's and Karl's idea of labels 
> being separate
> from the units. I think that was the flaw in my original 
> proposal - if we
> can standarise on the units, then RRD generation should be 
> fairly easy and
> then you can keep labels descriptive and whatever you think 
> is suitable for
> a particular plugin.
> 
> So my amended proposal is:
> 
> - output of format 'label=value[UOM]' comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be
> allowed?)
> - special labels of warn, warnp, crit and critp (or just warn 
> and crit with
> different units?). These pass the threshold levels specified 
> on the command
> line. My idea on this is that you can then use RRD to draw 
> yellow/red lines
> to show where the warning levels are.
> - values in class [-0-9.]. No spaces. Karl has a worry about returned
> values
> from SNMP OIDs, but I think values should always be a number, 
> so it can be
> parsed to remove extraneous characters
> - units one of:
> 
> no unit specified - assume a number (int or float) of things (users,
> processes, load averages)
> s - seconds (also, us, ms)
> % - percentage
> b - bytes (also kb, Mb, Tb)
> c - a continuous counter (such as bytes transmitted on an 
> interface) (Does
> this interfere with a standard unit?)
> 
> So some examples:
> 
> check_ping:
> PING OK - Packet loss = 0%, RTA = 1.00
> ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20%
> 
> check_disk:
> DISK OK [1150211 kB (57%) free on
> /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10%
> I still think that you do not need the total, used and 
> used_percent because
> these are calculatable from free and free_percent. I would 
> also use free
> rather than used because the lowest limit is 0 and the output 
> shows free. I
> think if you specify a set of disks, then data is returned 
> for the total of
> the disks.
> 
> check_swap:
> CRITICAL - Swap used: 18% (778368 out of
> 4194272)|free_percent=82%,free=778Mb,warnp=5%
> 
> check_load:
> OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2
> I think we should only return performance data for 1 set of timings,
> otherwise it gets very complicated (on a side issue, it is 
> possible to have
> a plugin return % values instead of load levels?)
> 
> check_procs:
> OK - 5 processes running with command name
> /usr/local/apache/bin/httpd|processes=5,warn=10
> Hmmm, this goes against my check_disk example of using 0 as a 
> lower bound
> as
> check_procs can only be reported "upwards"
> 
> check_users:
> USERS OK - 2 users currently logged in|users=2,warn=10,crit=20
> 
> Are we getting closer?
> 
> Ton
> 
> 
> This private and confidential e-mail has been sent to you by Egg.
> The Egg group of companies includes Egg Banking plc
> (registered no. 2999842), Egg Financial Products Ltd (registered
> no. 3319027) and Egg Investments Ltd (registered no. 3403963) which
> carries out investment business on behalf of Egg and is regulated
> by the Financial Services Authority.
> Registered in England and Wales. Registered offices: 1 
> Waterhouse Square,
> 138-142 Holborn, London EC1N 2NA.
> If you are not the intended recipient of this e-mail and have
> received it in error, please notify the sender by replying with
> 'received in error' as the subject and then delete it from your
> mailbox.
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting 
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 




More information about the Devel mailing list