diff options
| author | Holger Weiss <holger@zedat.fu-berlin.de> | 2013-10-28 22:25:31 +0100 |
|---|---|---|
| committer | Holger Weiss <holger@zedat.fu-berlin.de> | 2013-10-28 22:25:31 +0100 |
| commit | c6d1c24d8c06da644931851ba8b0ffbd50c02d16 (patch) | |
| tree | 2f24eea6bbee0b817196ac914de405102457ee0a | |
| parent | 8fce7063e9054ae3ce59023dc931874c61ddde4b (diff) | |
| download | site-c6d1c24d8c06da644931851ba8b0ffbd50c02d16.tar.gz | |
new-threshold-syntax.md: Various cosmetic changes
Fix a few small typos and apply various cosmetic changes.
| -rw-r--r-- | web/input/doc/new-threshold-syntax.md | 171 |
1 files changed, 86 insertions, 85 deletions
diff --git a/web/input/doc/new-threshold-syntax.md b/web/input/doc/new-threshold-syntax.md index c3eb8b7..4cf8cf6 100644 --- a/web/input/doc/new-threshold-syntax.md +++ b/web/input/doc/new-threshold-syntax.md | |||
| @@ -11,18 +11,18 @@ _Ton Voon, March 17, 2008_ | |||
| 11 | ## Overview | 11 | ## Overview |
| 12 | 12 | ||
| 13 | The method for defining thresholds via the command line is inconsistent and | 13 | The method for defining thresholds via the command line is inconsistent and |
| 14 | difficult to interpret. This proposal suggests a different way of specifying | 14 | difficult to interpret. This proposal suggests a different way of specifying |
| 15 | thresholds, which will also changes the metrics of performance data returned. | 15 | thresholds, which will also change the metrics of performance data returned. |
| 16 | 16 | ||
| 17 | ## Problem | 17 | ## Problem |
| 18 | 18 | ||
| 19 | The current method of specifying thresholds is confusing when there are | 19 | The current method of specifying thresholds is confusing when there are |
| 20 | different checks required. For instance, in check\_http, to check page size | 20 | different checks required. For instance, in `check_http`, to check page size |
| 21 | and time, you can specify -w {warn time}, -c {crit time}, -m | 21 | and time, you can specify `-w {warn time}`, `-c {crit time}`, |
| 22 | {minpagesize}[:maxpagesize], -M {maxage of document}. | 22 | `-m {minpagesize}[:maxpagesize]`, `-M {maxage of document}`. |
| 23 | 23 | ||
| 24 | Also, note the ways of defining the range are inconsistent. Some alert above | 24 | Also, note the ways of defining the range are inconsistent. Some alert above |
| 25 | the value (time, maxage), some alert below the value (pagesize). This is | 25 | the value (time, maxage), some alert below the value (pagesize). This is |
| 26 | inconsistent for the same plugin! | 26 | inconsistent for the same plugin! |
| 27 | 27 | ||
| 28 | So, to check that a web page is returned within 5 seconds, the minimum page | 28 | So, to check that a web page is returned within 5 seconds, the minimum page |
| @@ -34,7 +34,7 @@ Furthermore, the current specification for ranges in the developer guidelines | |||
| 34 | fails the “obviousness” test: a range of 3:5 will alert if the value is | 34 | fails the “obviousness” test: a range of 3:5 will alert if the value is |
| 35 | outside that range, rather than inside as you would expect. | 35 | outside that range, rather than inside as you would expect. |
| 36 | 36 | ||
| 37 | Also, the performance data returned by check\_http is always time and size. | 37 | Also, the performance data returned by `check_http` is always time and size. |
| 38 | Perhaps you want only time, or you want age as well. | 38 | Perhaps you want only time, or you want age as well. |
| 39 | 39 | ||
| 40 | ## Proposal | 40 | ## Proposal |
| @@ -52,42 +52,42 @@ The threshold definition is a subgetopt format of the form: | |||
| 52 | 52 | ||
| 53 | Where: | 53 | Where: |
| 54 | 54 | ||
| 55 | - ok, warn, crit are called “levels” | 55 | - `ok`, `warn`, `crit` are called “levels” |
| 56 | - any of ok, warn, crit, unit or prefix are optional | 56 | - any of `ok`, `warn`, `crit`, `unit` or `prefix` are optional |
| 57 | - if ok, warning and critical are not specified, then no alert is raised, | 57 | - if `ok`, `warning` and `critical` are not specified, then no alert is |
| 58 | but the performance data will be returned | 58 | raised, but the performance data will be returned |
| 59 | - the unit can be specified with plugins that do not know about the type of | 59 | - the `unit` can be specified with plugins that do not know about the type of |
| 60 | value returned (SNMP, Windows performance counters, etc.) | 60 | value returned (SNMP, Windows performance counters, etc.) |
| 61 | - the prefix is used to multiply the input range and possibly for display | 61 | - the `prefix` is used to multiply the input range and possibly for display |
| 62 | data. The prefixes allowed are defined by NIST: | 62 | data. The prefixes allowed are defined by NIST: |
| 63 | <http://physics.nist.gov/cuu/Units/prefixes.html> | 63 | <http://physics.nist.gov/cuu/Units/prefixes.html> |
| 64 | <http://physics.nist.gov/cuu/Units/binary.html> | 64 | <http://physics.nist.gov/cuu/Units/binary.html> |
| 65 | - ok, warning or critical can be repeated to define an additional range. | 65 | - `ok`, `warning` or `critical` can be repeated to define an additional range. |
| 66 | This allows non-continuous ranges to be defined | 66 | This allows non-continuous ranges to be defined |
| 67 | - warning can be abbreviated to warn or w | 67 | - `warning` can be abbreviated to `warn` or `w` |
| 68 | - critical can be abbreviated to crit or c | 68 | - `critical` can be abbreviated to `crit` or `c` |
| 69 | 69 | ||
| 70 | ### Simple Range | 70 | ### Simple Range |
| 71 | 71 | ||
| 72 | The range values have two specifications: simple and complex. Simple ranges | 72 | The range values have two specifications: simple and complex. Simple ranges |
| 73 | are of the format: | 73 | are of the format: |
| 74 | 74 | ||
| 75 | start..end | 75 | start..end |
| 76 | 76 | ||
| 77 | Where: | 77 | Where: |
| 78 | 78 | ||
| 79 | - start and end must be defined | 79 | - `start` and `end` must be defined |
| 80 | - start and end match the regular expression | 80 | - `start` and `end` match the regular expression |
| 81 | /^[+-]?[0-9]+\\.?[0-9]\*$|^inf$/ (ie, a numeric or “inf”) | 81 | `/^[+-]?[0-9]+\.?[0-9]*$|^inf$/` (ie, a numeric or “inf”) |
| 82 | - start ≤ end | 82 | - `start ≤ end` |
| 83 | - if start = “inf”, this is negative infinity. This can also be written as | 83 | - if `start` = `inf`, this is negative infinity. This can also be written as |
| 84 | “-inf” | 84 | `-inf` |
| 85 | - if end = “inf”, this is positive infinity | 85 | - if `end` = `inf`, this is positive infinity |
| 86 | - endpoints are inclusive of the range | 86 | - endpoints are inclusive of the range |
| 87 | - alert is raised if value is inside start and end range | 87 | - alert is raised if value is inside `start` and `end` range |
| 88 | 88 | ||
| 89 | (Note: this may be extended in future for adding multiple ranges using a | 89 | (Note: this may be extended in future for adding multiple ranges using a |
| 90 | separator - I think this is catered for by repeating ok=,warn=,crit=.) | 90 | separator - I think this is catered for by repeating `ok=,warn=,crit=`.) |
| 91 | 91 | ||
| 92 | This simple range does not require quoting at the shell. | 92 | This simple range does not require quoting at the shell. |
| 93 | 93 | ||
| @@ -103,17 +103,17 @@ or | |||
| 103 | 103 | ||
| 104 | Where: | 104 | Where: |
| 105 | 105 | ||
| 106 | - start and end must be defined | 106 | - `start` and `end` must be defined |
| 107 | - start and end match the regular expression | 107 | - `start` and `end` match the regular expression |
| 108 | /\^[+-]?[0-9]+\\.?[0-9]\*\$|\^inf\$/ (ie, a numeric or “inf”) | 108 | `/\^[+-]?[0-9]+\.?[0-9]*$|^inf$/` (ie, a numeric or “inf”) |
| 109 | - start ≤ end | 109 | - `start` ≤ `end` |
| 110 | - if start = “inf”, this is negative infinity. This can also be written as | 110 | - if `start` = `inf`, this is negative infinity. This can also be written as |
| 111 | “-inf” | 111 | `-inf` |
| 112 | - if end = “inf”, this is positive infinity | 112 | - if `end` = `inf`, this is positive infinity |
| 113 | - endpoints are excluded from the range if () are used, otherwise endpoints | 113 | - endpoints are excluded from the range if () are used, otherwise endpoints |
| 114 | are included in the range | 114 | are included in the range |
| 115 | - alert is raised if value is within start and end range, unless \^ is used, | 115 | - alert is raised if value is within `start` and `end` range, unless `^` is |
| 116 | in which case alert is raised if outside the range | 116 | used, in which case alert is raised if outside the range |
| 117 | 117 | ||
| 118 | Note that due to shell characters, quoting may be required. | 118 | Note that due to shell characters, quoting may be required. |
| 119 | 119 | ||
| @@ -122,17 +122,18 @@ Note that due to shell characters, quoting may be required. | |||
| 122 | Given a numeric value, the state of the threshold is calculated from the | 122 | Given a numeric value, the state of the threshold is calculated from the |
| 123 | following ordered rules: | 123 | following ordered rules: |
| 124 | 124 | ||
| 125 | 1. If no levels are specified, return OK | 125 | 1. If no levels are specified, return `OK` |
| 126 | 2. If an ok level is specified and value is within range, return OK | 126 | 2. If an `ok` level is specified and value is within range, return `OK` |
| 127 | 3. If a critical level is specified and value is within range, return | 127 | 3. If a `critical` level is specified and value is within range, return |
| 128 | CRITICAL | 128 | `CRITICAL` |
| 129 | 4. If a warning level is specified and value is within range, return WARNING | 129 | 4. If a `warning` level is specified and value is within range, return |
| 130 | 5. If an ok level is specified, return CRITICAL | 130 | `WARNING` |
| 131 | 6. Otherwise return OK | 131 | 5. If an `ok` level is specified, return `CRITICAL` |
| 132 | 6. Otherwise return `OK` | ||
| 132 | 133 | ||
| 133 | ### Looking Back … | 134 | ### Looking Back … |
| 134 | 135 | ||
| 135 | So the check\_http example becomes: | 136 | So the `check_http` example becomes: |
| 136 | 137 | ||
| 137 | check_http -H $HOSTADDRESS$ \ | 138 | check_http -H $HOSTADDRESS$ \ |
| 138 | --th metric=time,ok=0..5 \ | 139 | --th metric=time,ok=0..5 \ |
| @@ -144,26 +145,26 @@ age) and more consistent (I’m alerting above 5, less than 10 and above 1, | |||
| 144 | respectively). | 145 | respectively). |
| 145 | 146 | ||
| 146 | In addition, performance data will only be output if the metric has been | 147 | In addition, performance data will only be output if the metric has been |
| 147 | specified. So only show time performance data if “--th metric=time” has been | 148 | specified. So only show time performance data if `--th metric=time` has been |
| 148 | specified on the command line. Both warning\_range or critical\_range can be | 149 | specified on the command line. Both warning range or critical range can be |
| 149 | unspecified - this effectively means “I am not going to alert on this value, | 150 | unspecified - this effectively means “I am not going to alert on this value, |
| 150 | but I’d like to be informed about it in the performance data”. | 151 | but I’d like to be informed about it in the performance data”. |
| 151 | 152 | ||
| 152 | Because the specification for a range has changed, the warning and critical | 153 | Because the specification for a range has changed, the warning and critical |
| 153 | parts of the performance data can no longer be guaranteed. There is an | 154 | parts of the performance data can no longer be guaranteed. There is an |
| 154 | additional piece of work required to fix a new format for performance data. | 155 | additional piece of work required to fix a new format for performance data. |
| 155 | However, the basic | 156 | However, the basic |
| 156 | 157 | ||
| 157 | label=value[uom] | 158 | label=value[uom] |
| 158 | 159 | ||
| 159 | Will still be valid. | 160 | will still be valid. |
| 160 | 161 | ||
| 161 | ### Examples | 162 | ### Examples |
| 162 | 163 | ||
| 163 | Other examples. | 164 | Other examples. |
| 164 | 165 | ||
| 165 | To check httpd processes are OK if the virtual size is under 8096 bytes. Warn | 166 | To check httpd processes are `OK` if the virtual size is under 8096 bytes. |
| 166 | until they reach 16182, but bigger than that is CRITICAL. | 167 | Warn until they reach 16182, but bigger than that is `CRITICAL`. |
| 167 | 168 | ||
| 168 | # old | 169 | # old |
| 169 | check_procs -w 8096 -c 16182 -C httpd --metric VSZ | 170 | check_procs -w 8096 -c 16182 -C httpd --metric VSZ |
| @@ -171,8 +172,8 @@ until they reach 16182, but bigger than that is CRITICAL. | |||
| 171 | # new | 172 | # new |
| 172 | check_procs -C httpd --th metric=vsize,ok=0..8096,warn=8097..16182 | 173 | check_procs -C httpd --th metric=vsize,ok=0..8096,warn=8097..16182 |
| 173 | 174 | ||
| 174 | There should always be one and only one ‘tnslsnr’ process. Otherwise | 175 | There should always be one and only one ‘tnslsnr’ process. Otherwise |
| 175 | critical. | 176 | `CRITICAL`. |
| 176 | 177 | ||
| 177 | # old | 178 | # old |
| 178 | check_procs -w 1:1 -c 1:1 -C tnslsnr | 179 | check_procs -w 1:1 -c 1:1 -C tnslsnr |
| @@ -192,33 +193,33 @@ Load averages (1,5,15 minute) should be within reasonable ranges. | |||
| 192 | 193 | ||
| 193 | ## Plan | 194 | ## Plan |
| 194 | 195 | ||
| 195 | I personally plan on updating check\_procs. | 196 | I personally plan on updating `check_procs`. |
| 196 | 197 | ||
| 197 | The basic syntax is: | 198 | The basic syntax is: |
| 198 | 199 | ||
| 199 | check_procs [filter options] [threshold options] | 200 | check_procs [filter options] [threshold options] |
| 200 | 201 | ||
| 201 | Where filter options are the current -u {username}, -C {command}, etc. This | 202 | Where filter options are the current `-u {username}`, `-C {command}`, etc. |
| 202 | reduces the set of processes that are to be calculated. | 203 | This reduces the set of processes that are to be calculated. |
| 203 | 204 | ||
| 204 | The new threshold metrics will be: | 205 | The new threshold metrics will be: |
| 205 | 206 | ||
| 206 | - number - alert on number of matching processes. Performance data returns | 207 | - number - alert on number of matching processes. Performance data returns |
| 207 | number of processes | 208 | number of processes |
| 208 | - rss-threshold - alert on rss size if any matching process is in range. | 209 | - rss-threshold - alert on rss size if any matching process is in range. Perf |
| 209 | Perf data returns average rss | 210 | data returns average rss |
| 210 | - rss-max - Same as --rss, but perf data returns max rss | 211 | - rss-max - Same as `--rss`, but perf data returns max rss |
| 211 | - rss-sum - alert on the total rss of all matching processes. Perf data | 212 | - rss-sum - alert on the total rss of all matching processes. Perf data |
| 212 | returns rss\_sum | 213 | returns rss\_sum |
| 213 | - vsz-threshold - alert on vsz size if any matching process is in range. | 214 | - vsz-threshold - alert on vsz size if any matching process is in range. Perf |
| 214 | Perf data returns average vsz | 215 | data returns average vsz |
| 215 | - vsz-max - Same as --vsz, but perf data returns max rss | 216 | - vsz-max - Same as `--vsz`, but perf data returns max rss |
| 216 | - vsz-sum - alert on the total vsz of all matching processes. Perf data | 217 | - vsz-sum - alert on the total vsz of all matching processes. Perf data |
| 217 | returns vsz\_sum | 218 | returns vsz\_sum |
| 218 | - cpu-threshold - alert on cpu % of all matching processes. Perf data | 219 | - cpu-threshold - alert on cpu % of all matching processes. Perf data returns |
| 219 | returns average cpu | 220 | average cpu |
| 220 | - cpu-max - Same as --cpu, but perf data returns max cpu | 221 | - cpu-max - Same as `--cpu`, but perf data returns max cpu |
| 221 | - cpu-sum - alert on total cpu. Perf data returns cpu\_sum | 222 | - cpu-sum - alert on total cpu. Perf data returns cpu\_sum |
| 222 | 223 | ||
| 223 | There will be C library routines for parsing the threshold values. | 224 | There will be C library routines for parsing the threshold values. |
| 224 | 225 | ||
| @@ -228,16 +229,16 @@ performance data. | |||
| 228 | ## Terminology | 229 | ## Terminology |
| 229 | 230 | ||
| 230 | **metric** | 231 | **metric** |
| 231 | : Something that a check is going to be measured against. For example, for | 232 | : Something that a check is going to be measured against. For example, for |
| 232 | disk checks, it could be used or free or inodes\_free; for http checks, it | 233 | disk checks, it could be used or free or inodes\_free; for HTTP checks, it |
| 233 | could be time [taken] or size; for process checks, it could be cpu or | 234 | could be time taken or size; for process checks, it could be cpu or |
| 234 | number [of processes] or vsz | 235 | number of processes or vsz |
| 235 | 236 | ||
| 236 | **range** | 237 | **range** |
| 237 | : This defines a continuous range of values when an alert would be raised | 238 | : This defines a continuous range of values when an alert would be raised |
| 238 | 239 | ||
| 239 | **level** | 240 | **level** |
| 240 | : This is an alert level within Nagios - OK, WARNING or CRITICAL | 241 | : This is an alert level within Nagios - `OK`, `WARNING` or `CRITICAL` |
| 241 | 242 | ||
| 242 | **threshold** | 243 | **threshold** |
| 243 | : This consists of a level with a range | 244 | : This consists of a level with a range |
| @@ -246,7 +247,7 @@ performance data. | |||
| 246 | 247 | ||
| 247 | This assumes that you are always comparing numbers as the metric values. | 248 | This assumes that you are always comparing numbers as the metric values. |
| 248 | 249 | ||
| 249 | There maybe some limitations in the precision of values. All internal logic | 250 | There maybe some limitations in the precision of values. All internal logic |
| 250 | should use double precision. | 251 | should use double precision. |
| 251 | 252 | ||
| 252 | If there are multiple metrics, the alert will be on an OR basis, that is, any | 253 | If there are multiple metrics, the alert will be on an OR basis, that is, any |
