diff options
Diffstat (limited to 'doc/developer-guidelines.sgml')
| -rw-r--r-- | doc/developer-guidelines.sgml | 483 |
1 files changed, 483 insertions, 0 deletions
diff --git a/doc/developer-guidelines.sgml b/doc/developer-guidelines.sgml new file mode 100644 index 00000000..42ad8964 --- /dev/null +++ b/doc/developer-guidelines.sgml | |||
| @@ -0,0 +1,483 @@ | |||
| 1 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> | ||
| 2 | <book> | ||
| 3 | <title>Nagios Plug-in Developer Guidelines</title> | ||
| 4 | |||
| 5 | <bookinfo> | ||
| 6 | <authorgroup> | ||
| 7 | <author> | ||
| 8 | <firstname>Karl</firstname> | ||
| 9 | <surname>DeBisschop</surname> | ||
| 10 | <affiliation> | ||
| 11 | <address><email>karl@debisschop.net</email></address> | ||
| 12 | </affiliation> | ||
| 13 | </author> | ||
| 14 | |||
| 15 | <author> | ||
| 16 | <firstname>Ethan</firstname> | ||
| 17 | <surname>Galstad</surname> | ||
| 18 | <authorblurb> | ||
| 19 | <para>Author of Nagios</para> | ||
| 20 | <para><ulink url="http://www.nagios.org"></ulink></para> | ||
| 21 | </authorblurb> | ||
| 22 | <affiliation> | ||
| 23 | <address><email>netsaint@linuxbox.com</email></address> | ||
| 24 | </affiliation> | ||
| 25 | </author> | ||
| 26 | |||
| 27 | <author> | ||
| 28 | <firstname>Hugo</firstname> | ||
| 29 | <surname>Gayosso</surname> | ||
| 30 | <affiliation> | ||
| 31 | <address><email>hgayosso@gnu.org</email></address> | ||
| 32 | </affiliation> | ||
| 33 | </author> | ||
| 34 | |||
| 35 | |||
| 36 | <author> | ||
| 37 | <firstname>Subhendu</firstname> | ||
| 38 | <surname>Ghosh</surname> | ||
| 39 | <affiliation> | ||
| 40 | <address><email>sghosh@sourceforge.net</email></address> | ||
| 41 | </affiliation> | ||
| 42 | </author> | ||
| 43 | |||
| 44 | <author> | ||
| 45 | <firstname>Stanley</firstname> | ||
| 46 | <surname>Hopcroft</surname> | ||
| 47 | <affiliation> | ||
| 48 | <address><email>stanleyhopcroft@sourceforge.net</email></address> | ||
| 49 | </affiliation> | ||
| 50 | </author> | ||
| 51 | |||
| 52 | </authorgroup> | ||
| 53 | |||
| 54 | <pubdate>2002</pubdate> | ||
| 55 | <title>Nagios plug-in development guidelines</title> | ||
| 56 | |||
| 57 | <revhistory> | ||
| 58 | <revision> | ||
| 59 | <revnumber>0.4</revnumber> | ||
| 60 | <date>2 May 2002</date> | ||
| 61 | </revision> | ||
| 62 | </revhistory> | ||
| 63 | |||
| 64 | <copyright> | ||
| 65 | <year>2000 2001 2002</year> | ||
| 66 | <holder>Karl DeBisschop, Ethan Galstad, | ||
| 67 | Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh</holder> | ||
| 68 | </copyright> | ||
| 69 | |||
| 70 | </bookinfo> | ||
| 71 | |||
| 72 | |||
| 73 | <preface id=preface> | ||
| 74 | <title>About the guidelines</title> | ||
| 75 | |||
| 76 | <para>The purpose of this guidelines is to provide a reference for | ||
| 77 | the plug-in developers and encourage the standarization of the | ||
| 78 | different kind of plug-ins: C, shell, perl, python, etc.</para> | ||
| 79 | |||
| 80 | |||
| 81 | <section> <title>Copyright</title> | ||
| 82 | |||
| 83 | <para>Nagios Plug-in Development Guidelines Copyright (C) 2000 2001 | ||
| 84 | 2002 | ||
| 85 | Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft, | ||
| 86 | Subhendu Ghosh</para> | ||
| 87 | |||
| 88 | <para>Permission is granted to make and distribute verbatim | ||
| 89 | copies of this manual provided the copyright notice and this | ||
| 90 | permission notice are preserved on all copies.</para> | ||
| 91 | |||
| 92 | <para>The plugins themselves are copyrighted by their respective | ||
| 93 | authors.</para> | ||
| 94 | |||
| 95 | </section> | ||
| 96 | </preface> | ||
| 97 | |||
| 98 | <article> | ||
| 99 | <section id="PlugOutput"><title>Plugin Output for Nagios</title> | ||
| 100 | |||
| 101 | <para>You should always print something to STDOUT that tells if the | ||
| 102 | service is working or why its failing. Try to keep the output short - | ||
| 103 | probably less that 80 characters. Remember that you ideally would like | ||
| 104 | the entire output to appear in a pager message, which will get chopped | ||
| 105 | off after a certain length.</para> | ||
| 106 | |||
| 107 | <section><title>Print only one line of text</title> | ||
| 108 | <para>Nagios will only grab the first line of text from STDOUT | ||
| 109 | when it notifies contacts about potential problems. If you print | ||
| 110 | multiple lines, you're out of luck. Remember, keep it short and | ||
| 111 | to the point.</para> | ||
| 112 | </section> | ||
| 113 | |||
| 114 | <section><title>Screen Output</title> | ||
| 115 | <para>The plug-in should print the diagnostic and just the | ||
| 116 | synopsis part of the help message. A well written plugin would | ||
| 117 | then have --help as a way to get the verbose help.</para> | ||
| 118 | <para>Code and output should try to respect the 80x25 size of a | ||
| 119 | crt (remember when fixing stuff in the server room!)</para> | ||
| 120 | </section> | ||
| 121 | |||
| 122 | <section><title>Return the proper status code</title> | ||
| 123 | <para>See <xref linkend="ReturnCodes"> below | ||
| 124 | for the numeric values of status codes and their | ||
| 125 | description. Remember to return an UNKNOWN state if bogus or | ||
| 126 | invalid command line arguments are supplied or it you are unable | ||
| 127 | to check the service.</para> | ||
| 128 | </section> | ||
| 129 | |||
| 130 | <section><title>Plugin Return Codes</title> | ||
| 131 | <para>The return codes below are based on the POSIX spec of returning | ||
| 132 | a positive value. Netsaint prior to v0.0.7 supported non-POSIX | ||
| 133 | compliant return code of "-1" for unknown. Nagios supports POSIX return | ||
| 134 | codes by default.</para> | ||
| 135 | |||
| 136 | <para>Note: Some plugins will on occasion print on STDOUT that an error | ||
| 137 | occurred and error code is 138 or 255 or some such number. These | ||
| 138 | are usually caused by plugins using system commands and having not | ||
| 139 | enough checks to catch unexpected output. Developers should include a | ||
| 140 | default catch-all for system command output that returns an UNKOWN | ||
| 141 | return code.</para> | ||
| 142 | |||
| 143 | <table id="ReturnCodes"><title>Plugin Return Codes</title> | ||
| 144 | <tgroup cols="3"> | ||
| 145 | <thead> | ||
| 146 | <row> | ||
| 147 | <entry><para>Numeric Value</para></entry> | ||
| 148 | <entry><para>Service Status</para></entry> | ||
| 149 | <entry><para>Status Description</para></entry> | ||
| 150 | </row> | ||
| 151 | </thead> | ||
| 152 | <tbody> | ||
| 153 | <row> | ||
| 154 | <entry align=center><para>0</para></entry> | ||
| 155 | <entry valign=middle><para>OK</para></entry> | ||
| 156 | <entry><para>The plugin was able to check the service and it | ||
| 157 | appeared to be functioning properly</para></entry> | ||
| 158 | </row> | ||
| 159 | <row> | ||
| 160 | <entry align=center><para>1</para></entry> | ||
| 161 | <entry valign=middle><para>Warning</para></entry> | ||
| 162 | <entry><para>The plugin was able to check the service, but it | ||
| 163 | appeared to be above some "warning" threshold or did not appear | ||
| 164 | to be working properly</para></entry> | ||
| 165 | </row> | ||
| 166 | <row> | ||
| 167 | <entry align=center><para>2</para></entry> | ||
| 168 | <entry valign=middle><para>Critical</para></entry> | ||
| 169 | <entry><para>The plugin detected that either the service was not | ||
| 170 | running or it was above some "critical" threshold</para></entry> | ||
| 171 | </row> | ||
| 172 | <row> | ||
| 173 | <entry align=center><para>3</para></entry> | ||
| 174 | <entry valign=middle><para>Unknown</para></entry> | ||
| 175 | <entry><para>Invalid command line arguments were supplied to the | ||
| 176 | plugin or the plugin was unable to check the status of the given | ||
| 177 | hosts/service</para></entry> | ||
| 178 | </row> | ||
| 179 | </tbody> | ||
| 180 | </tgroup> | ||
| 181 | </table> | ||
| 182 | |||
| 183 | |||
| 184 | </section> | ||
| 185 | |||
| 186 | |||
| 187 | </section> | ||
| 188 | |||
| 189 | <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title> | ||
| 190 | |||
| 191 | <section><title>Don't execute system commands without specifying their | ||
| 192 | full path</title> | ||
| 193 | <para>Don't use exec(), popen(), etc. to execute external | ||
| 194 | commands without explicity using the full path of the external | ||
| 195 | program.</para> | ||
| 196 | |||
| 197 | <para>Doing otherwise makes the plugin vulnerable to hijacking | ||
| 198 | by a trojan horse earlier in the search path. See the main | ||
| 199 | plugin distribution for examples on how this is done.</para> | ||
| 200 | </section> | ||
| 201 | |||
| 202 | <section><title>Use spopen() if external commands must be executed</title> | ||
| 203 | |||
| 204 | <para>If you have to execute external commands from within your | ||
| 205 | plugin and you're writing it in C, use the spopen() function | ||
| 206 | that Karl DeBisschop has written.</para> | ||
| 207 | |||
| 208 | <para>The code for spopen() and spclose() is included with the | ||
| 209 | core plugin distribution.</para> | ||
| 210 | </section> | ||
| 211 | |||
| 212 | <section><title>Don't make temp files unless absolutely required</title> | ||
| 213 | |||
| 214 | <para>If temp files are needed, make sure that the plugin will | ||
| 215 | fail cleanly if the file can't be written (e.g., too few file | ||
| 216 | handles, out of disk space, incorrect permissions, etc.) and | ||
| 217 | delete the temp file when processing is complete.</para> | ||
| 218 | </section> | ||
| 219 | |||
| 220 | <section><title>Don't be tricked into following symlinks</title> | ||
| 221 | |||
| 222 | <para>If your plugin opens any files, take steps to ensure that | ||
| 223 | you are not following a symlink to another location on the | ||
| 224 | system.</para> | ||
| 225 | </section> | ||
| 226 | |||
| 227 | <section><title>Validate all input</title> | ||
| 228 | |||
| 229 | <para>use routines in utils.c or utils.pm and write more as needed</para> | ||
| 230 | </section> | ||
| 231 | |||
| 232 | </section> | ||
| 233 | |||
| 234 | |||
| 235 | |||
| 236 | |||
| 237 | <section id="PerlPlugin"><title>Perl Plugins</title> | ||
| 238 | |||
| 239 | <para>Perl plugins are coded a little more defensively than other | ||
| 240 | plugins because of embedded Perl. When configured as such, embedded | ||
| 241 | Perl Nagios (ePN) requires stricter use of the some of Perl's features. | ||
| 242 | This section outlines some of the steps needed to use ePN | ||
| 243 | effectively.</para> | ||
| 244 | |||
| 245 | <orderedlist> | ||
| 246 | |||
| 247 | <listitem><para> Do not use BEGIN and END blocks since they will be called | ||
| 248 | the first time and when Nagios shuts down with Embedded Perl (ePN). In | ||
| 249 | particular, do not use BEGIN blocks to initialize variables.</para> | ||
| 250 | </listitem> | ||
| 251 | |||
| 252 | <listitem><para>To use utils.pm, you need to provide a full path to the | ||
| 253 | module in order for it to work with ePN.</para> | ||
| 254 | |||
| 255 | <literallayout> | ||
| 256 | e.g. | ||
| 257 | use lib "/usr/local/nagios/libexec"; | ||
| 258 | use utils qw(...); | ||
| 259 | </literallayout> | ||
| 260 | </listitem> | ||
| 261 | |||
| 262 | <listitem><para>Perl scripts should be called with "-w"</para> | ||
| 263 | </listitem> | ||
| 264 | |||
| 265 | <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at | ||
| 266 | least explicitly package names as in "$main::x" or predeclare every | ||
| 267 | variable. </para> | ||
| 268 | |||
| 269 | |||
| 270 | <para>Explicitly initialize each varialable in use. Otherwise with | ||
| 271 | caching enabled, the plugin will not be recompilied each time, and | ||
| 272 | therefore Perl will not reinitialize all the variables. All old | ||
| 273 | variable values will still be in effect.</para> | ||
| 274 | </listitem> | ||
| 275 | |||
| 276 | <listitem><para>Do not use < DATA > (these simply do not compile under ePN).</para> | ||
| 277 | </listitem> | ||
| 278 | |||
| 279 | <listitem><para>Do not use named subroutines</para> | ||
| 280 | </listitem> | ||
| 281 | |||
| 282 | <listitem><para>If writing to a file (perhaps recording | ||
| 283 | performance data) explicitly close close it. The plugin never | ||
| 284 | calls <emphasis role=strong>exit</emphasis>; that is caught by | ||
| 285 | p1.pl, so output streams are never closed.</para> | ||
| 286 | </listitem> | ||
| 287 | |||
| 288 | <listitem><para>As in <xref linkend="runtime"> all plugins need | ||
| 289 | to monitor their runtime, specially if they are using network | ||
| 290 | resources. Use of the <emphasis>alarm</emphasis> is recommended. | ||
| 291 | Plugins may import a default time out ($TIMEOUT) from utils.pm. | ||
| 292 | </para> | ||
| 293 | </listitem> | ||
| 294 | |||
| 295 | <listitem><para>Perl plugins should import %ERRORS from utils.pm | ||
| 296 | and then "exit $ERRORS{'OK'}" rather than "exit 0" | ||
| 297 | </para> | ||
| 298 | </listitem> | ||
| 299 | |||
| 300 | </orderedlist> | ||
| 301 | |||
| 302 | </section> | ||
| 303 | |||
| 304 | <section id="runtime"><title>Runtime Timeouts</title> | ||
| 305 | |||
| 306 | <para>Plugins have a very limited runtime - typically 10 sec. | ||
| 307 | As a result, it is very important for plugins to maintain internal | ||
| 308 | code to exit if runtime exceeds a threshold. </para> | ||
| 309 | |||
| 310 | <para>All plugins should timeout gracefully, not just networking | ||
| 311 | plugins. For instance, df may lock if you have automounted | ||
| 312 | drives and your network fails - but on first glance, who'd think | ||
| 313 | df could lock up like that. Plus, it should just be more error | ||
| 314 | resistant to be able to time out rather than consume | ||
| 315 | resources.</para> | ||
| 316 | |||
| 317 | <section><title>Use DEFAULT_SOCKET_TIMEOUT</title> | ||
| 318 | |||
| 319 | <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para> | ||
| 320 | |||
| 321 | </section> | ||
| 322 | |||
| 323 | |||
| 324 | <section><title>Add alarms to network plugins</title> | ||
| 325 | |||
| 326 | <para>If you write a plugin which communicates with another | ||
| 327 | networked host, you should make sure to set an alarm() in your | ||
| 328 | code that prevents the plugin from hanging due to abnormal | ||
| 329 | socket closures, etc. Nagios takes steps to protect itself | ||
| 330 | against unruly plugins that timeout, but any plugins you create | ||
| 331 | should be well behaved on their own.</para> | ||
| 332 | |||
| 333 | </section> | ||
| 334 | |||
| 335 | |||
| 336 | |||
| 337 | </section> | ||
| 338 | |||
| 339 | <section id="PlugOptions"><title>Plugin Options</title> | ||
| 340 | |||
| 341 | <para>A well written plugin should have --help as a way to get | ||
| 342 | verbose help. Code and output should try to respect the 80x25 size of a | ||
| 343 | crt (remember when fixing stuff in the server room!)</para> | ||
| 344 | |||
| 345 | <section><title>Option Processing</title> | ||
| 346 | |||
| 347 | <para>For plugins written in C, we recommend the C standard | ||
| 348 | getopt library for short options. If using getopt_long, check to | ||
| 349 | be sure that HAVE_GETOPT_H is defined (configure checks this and | ||
| 350 | sets the #define in common/config.h).</para> | ||
| 351 | |||
| 352 | <para>For plugins written in Perl, we recommend Getopt::Long module.</para> | ||
| 353 | |||
| 354 | <para>Positional arguments are strongly discouraged.</para> | ||
| 355 | |||
| 356 | <para>There are a few reserved options that should not be used | ||
| 357 | for other purposes:</para> | ||
| 358 | |||
| 359 | <literallayout> | ||
| 360 | -V version (--version) | ||
| 361 | -h help (--help) | ||
| 362 | -t timeout (--timeout) | ||
| 363 | -w warning threshold (--warning) | ||
| 364 | -c critical threshold (--critical) | ||
| 365 | -H hostname (--hostname) | ||
| 366 | </literallayout> | ||
| 367 | |||
| 368 | <para>In addition to the reserved options above, some other standard options are:</para> | ||
| 369 | |||
| 370 | <literallayout> | ||
| 371 | -C SNMP community (--community) | ||
| 372 | -a authentication password (--authentication) | ||
| 373 | -l login name (--logname) | ||
| 374 | -p port or password (--port or --passwd/--password)monitors operational | ||
| 375 | -u url or username (--url or --username) | ||
| 376 | </literallayout> | ||
| 377 | |||
| 378 | <para>Look at check_pgsql and check_procs to see how I currently | ||
| 379 | think this can work. Standard options are:</para> | ||
| 380 | |||
| 381 | |||
| 382 | <para>The option -V or --version should be present in all | ||
| 383 | plugins. For C plugins it should result in a call to print_revision, a | ||
| 384 | function in utils.c which takes two character arguments, the | ||
| 385 | command name and the plugin revision.</para> | ||
| 386 | |||
| 387 | <para>The -? option, or any other unparsable set of options, | ||
| 388 | should print out a short usage statement. Character width should | ||
| 389 | be 80 and less and no more that 23 lines should be printed (it | ||
| 390 | should display cleanly on a dumb terminal in a server | ||
| 391 | room).</para> | ||
| 392 | |||
| 393 | <para>The option -h or --help should be present in all plugins. | ||
| 394 | In C plugins, it should result in a call to print_help (or | ||
| 395 | equivalent). The function print_help should call print_revision, | ||
| 396 | then print_usage, then should provide detailed | ||
| 397 | help. Help text should fit on an 80-character width display, but | ||
| 398 | may run as many lines as needed.</para> | ||
| 399 | |||
| 400 | </section> | ||
| 401 | |||
| 402 | <section> | ||
| 403 | <title>Plugins with more than one type of threshold, or with | ||
| 404 | threshold ranges</title> | ||
| 405 | |||
| 406 | <para>Old style was to do things like -ct for critical time and | ||
| 407 | -cv for critical value. That goes out the window with POSIX | ||
| 408 | getopt. The allowable alternatves are:</para> | ||
| 409 | |||
| 410 | <orderedlist> | ||
| 411 | <listitem> | ||
| 412 | <para>long options like -critical-time (or -ct and -cv, I | ||
| 413 | suppose).</para> | ||
| 414 | </listitem> | ||
| 415 | |||
| 416 | <listitem> | ||
| 417 | <para>repeated options like `check_load -w 10 -w 6 -w 4 -c | ||
| 418 | 16 -c 10 -c 10`</para> | ||
| 419 | </listitem> | ||
| 420 | |||
| 421 | <listitem> | ||
| 422 | <para>for brevity, the above can be expressed as `check_load | ||
| 423 | -w 10,6,4 -c 16,10,10`</para> | ||
| 424 | </listitem> | ||
| 425 | |||
| 426 | <listitem> | ||
| 427 | <para>ranges are expressed with colons as in `check_procs -C | ||
| 428 | httpd -w 1:20 -c 1:30` which will warn above 20 instances, | ||
| 429 | and critical at 0 and above 30</para> | ||
| 430 | </listitem> | ||
| 431 | |||
| 432 | <listitem> | ||
| 433 | <para>lists are expressed with commas, so Jacob's check_nmap | ||
| 434 | uses constructs like '-p 1000,1010,1050:1060,2000'</para> | ||
| 435 | </listitem> | ||
| 436 | |||
| 437 | <listitem> | ||
| 438 | <para>If possible when writing lists, use tokens to make the | ||
| 439 | list easy to remember and non-order dependent - so | ||
| 440 | check_disk uses '-c 10000,10%' so that it is clear which is | ||
| 441 | the precentage and which is the KB values (note that due to | ||
| 442 | my own lack of foresight, that used to be '-c 10000:10%' but | ||
| 443 | such constructs should all be changed for consistency, | ||
| 444 | though providing reverse compatibility is fairly | ||
| 445 | easy).</para> | ||
| 446 | </listitem> | ||
| 447 | |||
| 448 | </orderedlist> | ||
| 449 | |||
| 450 | <para>As always, comments are welcome - making this consistent | ||
| 451 | without a host of long options was quite a hassle, and I would | ||
| 452 | suspect that there are flaws in this strategy. Perhaps clear | ||
| 453 | long-options is the most important of the above choices, but not | ||
| 454 | all POSIX systems have C libraries for long options, so the | ||
| 455 | short forms must exist as well.</para> | ||
| 456 | </section> | ||
| 457 | </section> | ||
| 458 | |||
| 459 | <section id="SubmittingChanges"><title>New submissions and patches</title> | ||
| 460 | |||
| 461 | <para>If you would like other to use your plugins and have it included in | ||
| 462 | the standard distribution, please include patches for the relavant | ||
| 463 | configuration files, in particular "configure.in" Otherwise submitted | ||
| 464 | plugins will be included in the contrib directory.</para> | ||
| 465 | |||
| 466 | <para>Plugins in the contrib directory are going to be migrated to the | ||
| 467 | standard plugins/plugin-scripts directory as time permits and per user | ||
| 468 | requests</para> | ||
| 469 | |||
| 470 | <para>Patches should be submitted via the SourceForge and be announced to | ||
| 471 | the mailing list.</para> | ||
| 472 | |||
| 473 | <para>For new plugins, provide a diff to add to the EXTRAS list (configure.in) | ||
| 474 | unless you are fairly sure that the plugin will work for all platforms with | ||
| 475 | no non-standard software added.</para> | ||
| 476 | |||
| 477 | <para>If possible please submit a test harness. Documentation on sample | ||
| 478 | tests coming soon.</para> | ||
| 479 | |||
| 480 | </section> | ||
| 481 | </article> | ||
| 482 | |||
| 483 | </book> | ||
