summaryrefslogtreecommitdiffstats
path: root/doc/developer-guidelines.sgml
blob: 42ad89642c8b7de8cef4c8a18e78f675d761613a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<book>
  <title>Nagios Plug-in Developer Guidelines</title>

  <bookinfo>
    <authorgroup>
      <author>
	<firstname>Karl</firstname>
	<surname>DeBisschop</surname>
	<affiliation>
	  <address><email>karl@debisschop.net</email></address>
	</affiliation>
      </author>

      <author>
	<firstname>Ethan</firstname>
	<surname>Galstad</surname>
	<authorblurb>
	  <para>Author of Nagios</para>
	  <para><ulink url="http://www.nagios.org"></ulink></para>
	</authorblurb>
	<affiliation>
	  <address><email>netsaint@linuxbox.com</email></address>
	</affiliation>
      </author>

      <author>
	<firstname>Hugo</firstname>
	<surname>Gayosso</surname>
	<affiliation>
	  <address><email>hgayosso@gnu.org</email></address>
	</affiliation>
      </author>

	  
	<author>
	<firstname>Subhendu</firstname>
	<surname>Ghosh</surname>
	<affiliation>
		<address><email>sghosh@sourceforge.net</email></address>
	</affiliation>
	</author>
	
	<author>
	<firstname>Stanley</firstname>
	<surname>Hopcroft</surname>
	<affiliation>
		<address><email>stanleyhopcroft@sourceforge.net</email></address>
	</affiliation>
	</author>	

    </authorgroup>

    <pubdate>2002</pubdate>
    <title>Nagios plug-in development guidelines</title>
	
    <revhistory>
       <revision>
          <revnumber>0.4</revnumber>
          <date>2 May 2002</date>
       </revision>
    </revhistory>

	<copyright>
		<year>2000 2001 2002</year> 
		<holder>Karl DeBisschop, Ethan Galstad, 
		Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh</holder>
	</copyright>

</bookinfo>


  <preface id=preface>
    <title>About the guidelines</title>

    <para>The purpose of this guidelines is to provide a reference for
    the plug-in developers and encourage the standarization of the
    different kind of plug-ins: C, shell, perl, python, etc.</para>


    <section> <title>Copyright</title>

        <para>Nagios Plug-in Development Guidelines Copyright (C) 2000 2001
		2002
        Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft, 
		Subhendu Ghosh</para>

        <para>Permission is granted to make and distribute verbatim
        copies of this manual provided the copyright notice and this
        permission notice are preserved on all copies.</para>

		<para>The plugins themselves are copyrighted by their respective
		authors.</para>

    </section>
</preface>

<article>
<section id="PlugOutput"><title>Plugin Output for Nagios</title>
	
		<para>You should always print something to STDOUT that tells if the 
		service is working or why its failing. Try to keep the output short - 
		probably less that 80 characters. Remember that you ideally would like 
		the entire output to appear in a pager message, which will get chopped
		off after a certain length.</para>

		<section><title>Print only one line of text</title>
		<para>Nagios will only grab the first line of text from STDOUT
		when it notifies contacts about potential problems. If you print
		multiple lines, you're out of luck. Remember, keep it short and
		to the point.</para>
	    </section>

		<section><title>Screen Output</title>
		<para>The plug-in should print the diagnostic and just the
		synopsis part of the help message.  A well written plugin would
		then have --help as a way to get the verbose help.</para>
		<para>Code and output should try to respect the 80x25 size of a
		crt (remember when fixing stuff in the server room!)</para>
		</section>
		
	    <section><title>Return the proper status code</title>
		<para>See <xref linkend="ReturnCodes"> below
		for the numeric values of status codes and their
		description. Remember to return an UNKNOWN state if bogus or
		invalid command line arguments are supplied or it you are unable
		to check the service.</para>
		</section>
		
		<section><title>Plugin Return Codes</title>
		<para>The return codes below are based on the POSIX spec of returning
		a positive value.  Netsaint prior to v0.0.7 supported non-POSIX
		compliant return code of "-1" for unknown.  Nagios supports POSIX return
		codes by default.</para>

		<para>Note: Some plugins will on occasion print on STDOUT that an error
		occurred and error code is 138 or 255 or some such number.  These
		are usually caused by plugins using system commands and having not 
		enough checks to catch unexpected output.  Developers should include a
		default catch-all for system command output that returns an UNKOWN
		return code.</para>
		
		<table id="ReturnCodes"><title>Plugin Return Codes</title>
			<tgroup cols="3">
				<thead>
					<row>
						<entry><para>Numeric Value</para></entry>
						<entry><para>Service Status</para></entry>
						<entry><para>Status Description</para></entry>
					</row>
				</thead>
				<tbody>
					<row>
						<entry align=center><para>0</para></entry>
						<entry valign=middle><para>OK</para></entry>
						<entry><para>The plugin was able to check the service and it 
						appeared to be functioning properly</para></entry>
					</row>
					<row>
						<entry align=center><para>1</para></entry>
						<entry valign=middle><para>Warning</para></entry>
						<entry><para>The plugin was able to check the service, but it 
						appeared to be above some "warning" threshold or did not appear 
						to be working properly</para></entry>
					</row>
					<row>
						<entry align=center><para>2</para></entry>
						<entry valign=middle><para>Critical</para></entry>
						<entry><para>The plugin detected that either the service was not 
						running or it was above some "critical" threshold</para></entry>
					</row>
					<row>
						<entry align=center><para>3</para></entry>
						<entry valign=middle><para>Unknown</para></entry>
						<entry><para>Invalid command line arguments were supplied to the 
						plugin or the plugin was unable to check the status of the given 
						hosts/service</para></entry>
					</row>
				</tbody>
			</tgroup>
		</table>

      
		</section>


</section>

<section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title>

		<section><title>Don't execute system commands without specifying their
		full path</title>
		<para>Don't use exec(), popen(), etc. to execute external
		commands without explicity using the full path of the external
		program.</para>

		<para>Doing otherwise makes the plugin vulnerable to hijacking
		by a trojan horse earlier in the search path. See the main
		plugin distribution for examples on how this is done.</para>
		</section>

		<section><title>Use spopen() if external commands must be executed</title>

	    <para>If you have to execute external commands from within your
    	plugin and you're writing it in C, use the spopen() function
		that Karl DeBisschop has written.</para>

		<para>The code for spopen() and spclose() is included with the
		core plugin distribution.</para>
		</section>

		<section><title>Don't make temp files unless absolutely required</title>

		<para>If temp files are needed, make sure that the plugin will
		fail cleanly if the file can't be written (e.g., too few file
		handles, out of disk space, incorrect permissions, etc.) and
		delete the temp file when processing is complete.</para>
		</section>

    	<section><title>Don't be tricked into following symlinks</title>

		<para>If your plugin opens any files, take steps to ensure that
		you are not following a symlink to another location on the
		system.</para>
		</section>

		<section><title>Validate all input</title>

		<para>use routines in utils.c or utils.pm and write more as needed</para>
		</section>

</section>
	



<section id="PerlPlugin"><title>Perl Plugins</title>

		<para>Perl plugins are coded a little more defensively than other
		plugins because of embedded Perl.  When configured as such, embedded
		Perl Nagios (ePN) requires stricter use of the some of Perl's features.
		This section outlines some of the steps needed to use ePN
		effectively.</para>
	  
		<orderedlist>
			
			<listitem><para> Do not use BEGIN and END blocks since they will be called 
			the first time and when Nagios shuts down with Embedded Perl (ePN).  In 
			particular, do not use BEGIN blocks to initialize variables.</para>
			</listitem>
	  
			<listitem><para>To use utils.pm, you need to provide a full path to the
			module in order for it to work with ePN.</para>
			
	  <literallayout>
	  e.g.
		use lib "/usr/local/nagios/libexec";
		use utils qw(...);
	  </literallayout>
	  		</listitem>

			<listitem><para>Perl scripts should be called with "-w"</para>
	  		</listitem>
			
			<listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at
			least explicitly package names as in "$main::x" or predeclare every
			variable. </para>
			

			<para>Explicitly initialize each varialable in use.  Otherwise with
			caching enabled, the plugin will not be recompilied each time, and
			therefore Perl will not reinitialize all the variables.  All old
			variable values will still be in effect.</para>
	  		</listitem>
			
			<listitem><para>Do not use < DATA > (these simply do not compile under ePN).</para>
	   		</listitem>

			<listitem><para>Do not use named subroutines</para> 
			</listitem>

			<listitem><para>If writing to a file (perhaps recording
			performance data) explicitly close close it.  The plugin never
			calls <emphasis role=strong>exit</emphasis>; that is caught by
			p1.pl, so output streams are never closed.</para>
			</listitem>
		
			<listitem><para>As in <xref linkend="runtime"> all plugins need 
			to monitor their runtime, specially if they are using network
			resources.  Use of the <emphasis>alarm</emphasis> is recommended.
			Plugins may import a default time out ($TIMEOUT) from utils.pm.
			</para>
			</listitem>

			<listitem><para>Perl plugins should import %ERRORS from utils.pm
			and then "exit $ERRORS{'OK'}" rather than "exit 0"
			</para>
			</listitem>
			
		</orderedlist>
	  
</section>

<section id="runtime"><title>Runtime Timeouts</title>

		<para>Plugins have a very limited runtime - typically 10 sec.
		As a result, it is very important for plugins to maintain internal
		code to exit if runtime exceeds a threshold. </para>

		<para>All plugins should timeout gracefully, not just networking
		plugins. For instance, df may lock if you have automounted
		drives and your network fails - but on first glance, who'd think
		df could lock up like that.  Plus, it should just be more error
		resistant to be able to time out rather than consume
		resources.</para>
		
		<section><title>Use DEFAULT_SOCKET_TIMEOUT</title>

		<para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para>

		</section>

		
		<section><title>Add alarms to network plugins</title>

		<para>If you write a plugin which communicates with another
		networked host, you should make sure to set an alarm() in your
		code that prevents the plugin from hanging due to abnormal
		socket closures, etc. Nagios takes steps to protect itself
		against unruly plugins that timeout, but any plugins you create
		should be well behaved on their own.</para>

		</section>

		

</section>

<section id="PlugOptions"><title>Plugin Options</title>
	
		<para>A well written plugin should have --help as a way to get 
		verbose help. Code and output should try to respect the 80x25 size of a
		crt (remember when fixing stuff in the server room!)</para>
		
		<section><title>Option Processing</title>

		<para>For plugins written in C, we recommend the C standard
		getopt library for short options. If using getopt_long, check to
		be sure that HAVE_GETOPT_H is defined (configure checks this and
		sets the #define in common/config.h).</para>

		<para>For plugins written in Perl, we recommend Getopt::Long module.</para>

		<para>Positional arguments are strongly discouraged.</para>

		<para>There are a few reserved options that should not be used
		for other purposes:</para>

		<literallayout>
          -V version (--version)
          -h help (--help)
          -t timeout (--timeout)
          -w warning threshold (--warning)
          -c critical threshold (--critical)
          -H hostname (--hostname)
		</literallayout>

		<para>In addition to the reserved options above, some other standard options are:</para>

		<literallayout>
          -C SNMP community (--community)
          -a authentication password (--authentication)
          -l login name (--logname)
          -p port or password (--port or --passwd/--password)monitors operational
          -u url or username (--url or --username)
		</literallayout>
	  
		<para>Look at check_pgsql and check_procs to see how I currently
		think this can work.  Standard options are:</para>

	  
		<para>The option -V or --version should be present in all
		plugins. For C plugins it should result in a call to print_revision, a
		function in utils.c which takes two character arguments, the
		command name and the plugin revision.</para>

		<para>The -? option, or any other unparsable set of options,
		should print out a short usage statement. Character width should
		be 80 and less and no more that 23 lines should be printed (it
		should display cleanly on a dumb terminal in a server
		room).</para>

		<para>The option -h or --help should be present in all plugins.
		In C plugins, it should result in a call to print_help (or
		equivalent).  The function print_help should call print_revision, 
		then print_usage, then should provide detailed
		help. Help text should fit on an 80-character width display, but
		may run as many lines as needed.</para>

    </section>

    <section>
      <title>Plugins with more than one type of threshold, or with
      threshold ranges</title>

      <para>Old style was to do things like -ct for critical time and
      -cv for critical value. That goes out the window with POSIX
      getopt. The allowable alternatves are:</para>

      <orderedlist>
	<listitem>
	  <para>long options like -critical-time (or -ct and -cv, I
	  suppose).</para>
	</listitem>

	<listitem>
	  <para>repeated options like `check_load -w 10 -w 6 -w 4 -c
	  16 -c 10 -c 10`</para>
	</listitem>

	<listitem>
	  <para>for brevity, the above can be expressed as `check_load
	  -w 10,6,4 -c 16,10,10`</para>
	</listitem>

	<listitem>
	  <para>ranges are expressed with colons as in `check_procs -C
	  httpd -w 1:20 -c 1:30` which will warn above 20 instances,
	  and critical at 0 and above 30</para>
	</listitem>

	<listitem>
	  <para>lists are expressed with commas, so Jacob's check_nmap
	  uses constructs like '-p 1000,1010,1050:1060,2000'</para>
	</listitem>

	<listitem>
	  <para>If possible when writing lists, use tokens to make the
	  list easy to remember and non-order dependent - so
	  check_disk uses '-c 10000,10%' so that it is clear which is
	  the precentage and which is the KB values (note that due to
	  my own lack of foresight, that used to be '-c 10000:10%' but
	  such constructs should all be changed for consistency,
	  though providing reverse compatibility is fairly
	  easy).</para>
	</listitem>

      </orderedlist>

      <para>As always, comments are welcome - making this consistent
      without a host of long options was quite a hassle, and I would
      suspect that there are flaws in this strategy. Perhaps clear
      long-options is the most important of the above choices, but not
      all POSIX systems have C libraries for long options, so the
      short forms must exist as well.</para>
    </section>
</section>

<section id="SubmittingChanges"><title>New submissions and patches</title>

	<para>If you would like other to use your plugins and have it included in
	the standard distribution, please include patches for the relavant
	configuration files, in particular "configure.in" Otherwise submitted 
	plugins will be included in the contrib directory.</para>
	
	<para>Plugins in the contrib directory are going to be migrated to the
	standard plugins/plugin-scripts directory as time permits and per user
	requests</para>

	<para>Patches should be submitted via the SourceForge and be announced to
	the mailing list.</para>
	
	<para>For new plugins, provide a diff to add to the EXTRAS list (configure.in) 
	unless you are fairly sure that the plugin will work for all platforms with 
	no non-standard	software added.</para>

	<para>If possible please submit a test harness. Documentation on sample
	tests coming soon.</para>

</section>
</article>
  
</book>