1 files changed, 256 insertions, 0 deletions
diff --git a/web/input/doc/new-threshold-syntax.md b/web/input/doc/new-threshold-syntax.md
new file mode 100644
index 0000000..c3eb8b7
--- /dev/null
+++ b/web/input/doc/new-threshold-syntax.md
@@ -0,0 +1,256 @@
+title: New Threshold Syntax
+parent: Documentation
+---
+<!--% # Auto-imported from: http://nagiosplugins.org/rfc/new_threshold_syntax # %-->
+# New Specification Method for Thresholds
+_Ton Voon, March 17, 2008_
+## Overview
+The method for defining thresholds via the command line is inconsistent and
+difficult to interpret. This proposal suggests a different way of specifying
+thresholds, which will also changes the metrics of performance data returned.
+## Problem
+The current method of specifying thresholds is confusing when there are
+different checks required. For instance, in check\_http, to check page size
+and time, you can specify -w {warn time}, -c {crit time}, -m
+{minpagesize}[:maxpagesize], -M {maxage of document}.
+Also, note the ways of defining the range are inconsistent. Some alert above
+the value (time, maxage), some alert below the value (pagesize). This is
+inconsistent for the same plugin!
+So, to check that a web page is returned within 5 seconds, the minimum page
+size is 10K and the maximum age is 1 day, you would invoke:
+    check_http -H $HOSTADDRESS$ -c 5 -m 10000 -M 1d
+Furthermore, the current specification for ranges in the developer guidelines
+fails the “obviousness” test: a range of 3:5 will alert if the value is
+outside that range, rather than inside as you would expect.
+Also, the performance data returned by check\_http is always time and size.
+Perhaps you want only time, or you want age as well.
+## Proposal
+### Thresholds
+This document proposes that threshold arguments are specified like:
+    --threshold={threshold definition}
+    --th={threshold definition}
+The threshold definition is a subgetopt format of the form:
+    metric={metric},ok={range},warn={range},crit={range},unit={unit},prefix={SI prefix}
+Where:
+-   ok, warn, crit are called “levels”
+-   any of ok, warn, crit, unit or prefix are optional
+-   if ok, warning and critical are not specified, then no alert is raised,
+    but the performance data will be returned
+-   the unit can be specified with plugins that do not know about the type of
+    value returned (SNMP, Windows performance counters, etc.)
+-   the prefix is used to multiply the input range and possibly for display
+    data. The prefixes allowed are defined by NIST:  
+    <http://physics.nist.gov/cuu/Units/prefixes.html>  
+    <http://physics.nist.gov/cuu/Units/binary.html>
+-   ok, warning or critical can be repeated to define an additional range.
+    This allows non-continuous ranges to be defined
+-   warning can be abbreviated to warn or w
+-   critical can be abbreviated to crit or c
+### Simple Range
+The range values have two specifications: simple and complex. Simple ranges
+are of the format:
+    start..end
+Where:
+-   start and end must be defined
+-   start and end match the regular expression
+    /^[+-]?[0-9]+\\.?[0-9]\*$|^inf$/ (ie, a numeric or “inf”)
+-   start ≤ end
+-   if start = “inf”, this is negative infinity. This can also be written as
+    “-inf”
+-   if end = “inf”, this is positive infinity
+-   endpoints are inclusive of the range
+-   alert is raised if value is inside start and end range
+(Note: this may be extended in future for adding multiple ranges using a
+separator - I think this is catered for by repeating ok=,warn=,crit=.)
+This simple range does not require quoting at the shell.
+### Complex Range
+Complex ranges are defined as:
+    [^](start..end)
+or
+    [^]start..end
+Where:
+-   start and end must be defined
+-   start and end match the regular expression
+    /\^[+-]?[0-9]+\\.?[0-9]\*\$|\^inf\$/ (ie, a numeric or “inf”)
+-   start ≤ end
+-   if start = “inf”, this is negative infinity. This can also be written as
+    “-inf”
+-   if end = “inf”, this is positive infinity
+-   endpoints are excluded from the range if () are used, otherwise endpoints
+    are included in the range
+-   alert is raised if value is within start and end range, unless \^ is used,
+    in which case alert is raised if outside the range
+Note that due to shell characters, quoting may be required.
+### Rules for Determining State
+Given a numeric value, the state of the threshold is calculated from the
+following ordered rules:
+1.  If no levels are specified, return OK
+2.  If an ok level is specified and value is within range, return OK
+3.  If a critical level is specified and value is within range, return
+    CRITICAL
+4.  If a warning level is specified and value is within range, return WARNING
+5.  If an ok level is specified, return CRITICAL
+6.  Otherwise return OK
+### Looking Back …
+So the check\_http example becomes:
+    check_http -H $HOSTADDRESS$ \
+               --th metric=time,ok=0..5 \
+               --th metric=size,ok=10..inf,prefix=Ki \
+               --th metric=age,ok=0..1,unit=d
+I believe this is more readable (I’m interested in the time, the size and the
+age) and more consistent (I’m alerting above 5, less than 10 and above 1,
+respectively).
+In addition, performance data will only be output if the metric has been
+specified. So only show time performance data if “--th metric=time” has been
+specified on the command line. Both warning\_range or critical\_range can be
+unspecified - this effectively means “I am not going to alert on this value,
+but I’d like to be informed about it in the performance data”.
+Because the specification for a range has changed, the warning and critical
+parts of the performance data can no longer be guaranteed. There is an
+additional piece of work required to fix a new format for performance data.
+However, the basic
+    label=value[uom]
+Will still be valid.
+### Examples
+Other examples.
+To check httpd processes are OK if the virtual size is under 8096 bytes. Warn
+until they reach 16182, but bigger than that is CRITICAL.
+    # old
+    check_procs -w 8096 -c 16182 -C httpd --metric VSZ
+    # new
+    check_procs -C httpd --th metric=vsize,ok=0..8096,warn=8097..16182
+There should always be one and only one ‘tnslsnr’ process. Otherwise
+critical.
+    # old
+    check_procs -w 1:1 -c 1:1 -C tnslsnr
+    # new
+    check_procs -C tnslsnr --th metric=count,ok=1..1
+Load averages (1,5,15 minute) should be within reasonable ranges.
+    # old
+    check_load -w 1.0,0.8,0.7 -c 1.5,1.3,1.0
+    # new
+    check_load --th metric=1min,ok=0..1.0,warn=1.0..1.5 \
+               --th metric=5min,ok=0..0.8,warn=0.8..1.3 \
+               --th metric=15min,ok=0..0.7,warn=0.7..1.0
+## Plan
+I personally plan on updating check\_procs.
+The basic syntax is:
+    check_procs [filter options] [threshold options]
+Where filter options are the current -u {username}, -C {command}, etc. This
+reduces the set of processes that are to be calculated.
+The new threshold metrics will be:
+-   number - alert on number of matching processes. Performance data returns
+    number of processes
+-   rss-threshold - alert on rss size if any matching process is in range.
+    Perf data returns average rss
+-   rss-max - Same as --rss, but perf data returns max rss
+-   rss-sum - alert on the total rss of all matching processes. Perf data
+    returns rss\_sum
+-   vsz-threshold - alert on vsz size if any matching process is in range.
+    Perf data returns average vsz
+-   vsz-max - Same as --vsz, but perf data returns max rss
+-   vsz-sum - alert on the total vsz of all matching processes. Perf data
+    returns vsz\_sum
+-   cpu-threshold - alert on cpu % of all matching processes. Perf data
+    returns average cpu
+-   cpu-max - Same as --cpu, but perf data returns max cpu
+-   cpu-sum - alert on total cpu. Perf data returns cpu\_sum
+There will be C library routines for parsing the threshold values.
+There will be C library routines for the collection and output of the
+performance data.
+## Terminology
+**metric**
+:   Something that a check is going to be measured against. For example, for
+    disk checks, it could be used or free or inodes\_free; for http checks, it
+    could be time [taken] or size; for process checks, it could be cpu or
+    number [of processes] or vsz
+**range**
+:   This defines a continuous range of values when an alert would be raised
+**level**
+:   This is an alert level within Nagios - OK, WARNING or CRITICAL
+**threshold**
+:   This consists of a level with a range
+## Limitations
+This assumes that you are always comparing numbers as the metric values.
+There maybe some limitations in the precision of values. All internal logic
+should use double precision.
+If there are multiple metrics, the alert will be on an OR basis, that is, any
+single metric which passes its threshold will cause the plugin to return a
+failed state.
+<!--% # vim:set filetype=markdown textwidth=78 joinspaces: # %-->

diff --git a/web/input/doc/new-threshold-syntax.md b/web/input/doc/new-threshold-syntax.md new file mode 100644 index 0000000..c3eb8b7 --- /dev/null +++ b/web/input/doc/new-threshold-syntax.md
@@ -0,0 +1,256 @@
	1	title: New Threshold Syntax
	2	parent: Documentation
	3	---
	4
	5	<!--% # Auto-imported from: http://nagiosplugins.org/rfc/new_threshold_syntax # %-->
	6
	7	# New Specification Method for Thresholds
	8
	9	_Ton Voon, March 17, 2008_
	10
	11	## Overview
	12
	13	The method for defining thresholds via the command line is inconsistent and
	14	difficult to interpret. This proposal suggests a different way of specifying
	15	thresholds, which will also changes the metrics of performance data returned.
	16
	17	## Problem
	18
	19	The current method of specifying thresholds is confusing when there are
	20	different checks required. For instance, in check\_http, to check page size
	21	and time, you can specify -w {warn time}, -c {crit time}, -m
	22	{minpagesize}[:maxpagesize], -M {maxage of document}.
	23
	24	Also, note the ways of defining the range are inconsistent. Some alert above
	25	the value (time, maxage), some alert below the value (pagesize). This is
	26	inconsistent for the same plugin!
	27
	28	So, to check that a web page is returned within 5 seconds, the minimum page
	29	size is 10K and the maximum age is 1 day, you would invoke:
	30
	31	check_http -H $HOSTADDRESS$ -c 5 -m 10000 -M 1d
	32
	33	Furthermore, the current specification for ranges in the developer guidelines
	34	fails the “obviousness” test: a range of 3:5 will alert if the value is
	35	outside that range, rather than inside as you would expect.
	36
	37	Also, the performance data returned by check\_http is always time and size.
	38	Perhaps you want only time, or you want age as well.
	39
	40	## Proposal
	41
	42	### Thresholds
	43
	44	This document proposes that threshold arguments are specified like:
	45
	46	--threshold={threshold definition}
	47	--th={threshold definition}
	48
	49	The threshold definition is a subgetopt format of the form:
	50
	51	metric={metric},ok={range},warn={range},crit={range},unit={unit},prefix={SI prefix}
	52
	53	Where:
	54
	55	- ok, warn, crit are called “levels”
	56	- any of ok, warn, crit, unit or prefix are optional
	57	- if ok, warning and critical are not specified, then no alert is raised,
	58	but the performance data will be returned
	59	- the unit can be specified with plugins that do not know about the type of
	60	value returned (SNMP, Windows performance counters, etc.)
	61	- the prefix is used to multiply the input range and possibly for display
	62	data. The prefixes allowed are defined by NIST:
	63	<http://physics.nist.gov/cuu/Units/prefixes.html>
	64	<http://physics.nist.gov/cuu/Units/binary.html>
	65	- ok, warning or critical can be repeated to define an additional range.
	66	This allows non-continuous ranges to be defined
	67	- warning can be abbreviated to warn or w
	68	- critical can be abbreviated to crit or c
	69
	70	### Simple Range
	71
	72	The range values have two specifications: simple and complex. Simple ranges
	73	are of the format:
	74
	75	start..end
	76
	77	Where:
	78
	79	- start and end must be defined
	80	- start and end match the regular expression
	81	/^[+-]?[0-9]+\\.?[0-9]\*$\|^inf$/ (ie, a numeric or “inf”)
	82	- start ≤ end
	83	- if start = “inf”, this is negative infinity. This can also be written as
	84	“-inf”
	85	- if end = “inf”, this is positive infinity
	86	- endpoints are inclusive of the range
	87	- alert is raised if value is inside start and end range
	88
	89	(Note: this may be extended in future for adding multiple ranges using a
	90	separator - I think this is catered for by repeating ok=,warn=,crit=.)
	91
	92	This simple range does not require quoting at the shell.
	93
	94	### Complex Range
	95
	96	Complex ranges are defined as:
	97
	98	[^](start..end)
	99
	100	or
	101
	102	[^]start..end
	103
	104	Where:
	105
	106	- start and end must be defined
	107	- start and end match the regular expression
	108	/\^[+-]?[0-9]+\\.?[0-9]\*\$\|\^inf\$/ (ie, a numeric or “inf”)
	109	- start ≤ end
	110	- if start = “inf”, this is negative infinity. This can also be written as
	111	“-inf”
	112	- if end = “inf”, this is positive infinity
	113	- endpoints are excluded from the range if () are used, otherwise endpoints
	114	are included in the range
	115	- alert is raised if value is within start and end range, unless \^ is used,
	116	in which case alert is raised if outside the range
	117
	118	Note that due to shell characters, quoting may be required.
	119
	120	### Rules for Determining State
	121
	122	Given a numeric value, the state of the threshold is calculated from the
	123	following ordered rules:
	124
	125	1. If no levels are specified, return OK
	126	2. If an ok level is specified and value is within range, return OK
	127	3. If a critical level is specified and value is within range, return
	128	CRITICAL
	129	4. If a warning level is specified and value is within range, return WARNING
	130	5. If an ok level is specified, return CRITICAL
	131	6. Otherwise return OK
	132
	133	### Looking Back …
	134
	135	So the check\_http example becomes:
	136
	137	check_http -H $HOSTADDRESS$ \
	138	--th metric=time,ok=0..5 \
	139	--th metric=size,ok=10..inf,prefix=Ki \
	140	--th metric=age,ok=0..1,unit=d
	141
	142	I believe this is more readable (I’m interested in the time, the size and the
	143	age) and more consistent (I’m alerting above 5, less than 10 and above 1,
	144	respectively).
	145
	146	In addition, performance data will only be output if the metric has been
	147	specified. So only show time performance data if “--th metric=time” has been
	148	specified on the command line. Both warning\_range or critical\_range can be
	149	unspecified - this effectively means “I am not going to alert on this value,
	150	but I’d like to be informed about it in the performance data”.
	151
	152	Because the specification for a range has changed, the warning and critical
	153	parts of the performance data can no longer be guaranteed. There is an
	154	additional piece of work required to fix a new format for performance data.
	155	However, the basic
	156
	157	label=value[uom]
	158
	159	Will still be valid.
	160
	161	### Examples
	162
	163	Other examples.
	164
	165	To check httpd processes are OK if the virtual size is under 8096 bytes. Warn
	166	until they reach 16182, but bigger than that is CRITICAL.
	167
	168	# old
	169	check_procs -w 8096 -c 16182 -C httpd --metric VSZ
	170
	171	# new
	172	check_procs -C httpd --th metric=vsize,ok=0..8096,warn=8097..16182
	173
	174	There should always be one and only one ‘tnslsnr’ process. Otherwise
	175	critical.
	176
	177	# old
	178	check_procs -w 1:1 -c 1:1 -C tnslsnr
	179
	180	# new
	181	check_procs -C tnslsnr --th metric=count,ok=1..1
	182
	183	Load averages (1,5,15 minute) should be within reasonable ranges.
	184
	185	# old
	186	check_load -w 1.0,0.8,0.7 -c 1.5,1.3,1.0
	187
	188	# new
	189	check_load --th metric=1min,ok=0..1.0,warn=1.0..1.5 \
	190	--th metric=5min,ok=0..0.8,warn=0.8..1.3 \
	191	--th metric=15min,ok=0..0.7,warn=0.7..1.0
	192
	193	## Plan
	194
	195	I personally plan on updating check\_procs.
	196
	197	The basic syntax is:
	198
	199	check_procs [filter options] [threshold options]
	200
	201	Where filter options are the current -u {username}, -C {command}, etc. This
	202	reduces the set of processes that are to be calculated.
	203
	204	The new threshold metrics will be:
	205
	206	- number - alert on number of matching processes. Performance data returns
	207	number of processes
	208	- rss-threshold - alert on rss size if any matching process is in range.
	209	Perf data returns average rss
	210	- rss-max - Same as --rss, but perf data returns max rss
	211	- rss-sum - alert on the total rss of all matching processes. Perf data
	212	returns rss\_sum
	213	- vsz-threshold - alert on vsz size if any matching process is in range.
	214	Perf data returns average vsz
	215	- vsz-max - Same as --vsz, but perf data returns max rss
	216	- vsz-sum - alert on the total vsz of all matching processes. Perf data
	217	returns vsz\_sum
	218	- cpu-threshold - alert on cpu % of all matching processes. Perf data
	219	returns average cpu
	220	- cpu-max - Same as --cpu, but perf data returns max cpu
	221	- cpu-sum - alert on total cpu. Perf data returns cpu\_sum
	222
	223	There will be C library routines for parsing the threshold values.
	224
	225	There will be C library routines for the collection and output of the
	226	performance data.
	227
	228	## Terminology
	229
	230	metric
	231	: Something that a check is going to be measured against. For example, for
	232	disk checks, it could be used or free or inodes\_free; for http checks, it
	233	could be time [taken] or size; for process checks, it could be cpu or
	234	number [of processes] or vsz
	235
	236	range
	237	: This defines a continuous range of values when an alert would be raised
	238
	239	level
	240	: This is an alert level within Nagios - OK, WARNING or CRITICAL
	241
	242	threshold
	243	: This consists of a level with a range
	244
	245	## Limitations
	246
	247	This assumes that you are always comparing numbers as the metric values.
	248
	249	There maybe some limitations in the precision of values. All internal logic
	250	should use double precision.
	251
	252	If there are multiple metrics, the alert will be on an OR basis, that is, any
	253	single metric which passes its threshold will cause the plugin to return a
	254	failed state.
	255
	256	<!--% # vim:set filetype=markdown textwidth=78 joinspaces: # %-->