Nagios Plug-in Developer Guidelines
  
    
      
        
          Nagios Plugins Development Team
        
      
    
    2005
    Nagios plug-in development guidelines
	
    
       
          $Revision$
          $Date$
       
    
	
		2000 - 2005 
		Nagios Plugins Development Team
	
Preface
    The purpose of this guidelines is to provide a reference for
    the plug-in developers and encourage the standarization of the
    different kind of plug-ins: C, shell, perl, python, etc.
        Nagios Plug-in Development Guidelines Copyright (C) 2000-2005
        (Nagios Plugins Team)
        Permission is granted to make and distribute verbatim
        copies of this manual provided the copyright notice and this
        permission notice are preserved on all copies.
	The plugins themselves are copyrighted by their respective
	authors.
Development platform requirements
	
	Nagios plugins are developed to the GNU standard, so any OS which is supported by GNU
	should run the plugins. While the requirements for compiling the Nagios plugins release 
	is very small, to develop from CVS needs additional software to be installed. These are the 
	minimum levels of software required:
	
	gnu make 3.79
	automake 1.8
	autoconf 2.58
	gettext 0.11.5
	
	To compile from CVS, after you have checked out the code, run:
	
	tools/setup
	./configure
	make
	make install
	
	
Plugin Output for Nagios
	
		You should always print something to STDOUT that tells if the 
		service is working or why it is failing. Try to keep the output short - 
		probably less that 80 characters. Remember that you ideally would like 
		the entire output to appear in a pager message, which will get chopped
		off after a certain length.
		Print only one line of text
		Nagios will only grab the first line of text from STDOUT
		when it notifies contacts about potential problems. If you print
		multiple lines, you're out of luck. Remember, keep it short and
		to the point.
		Output should be in the format:
		
		METRIC STATUS: Information text
		
		However, note that this is not a requirement of the API, so you cannot depend on this
		being an accurate reflection of the status of the service - the status should always 
		be determined by the return code.
		
		Verbose output
		Use the -v flag for verbose output. You should allow multiple
		-v options for additional verbosity, up to a maximum of 3. The standard
		type of output should be:
		Verbose output levels
			
				
					
						Verbosity level
						Type of output
					
				
				
					
						0
						Single line, minimal output. Summary
					
					
						1
						Single line, additional information (eg list processes that fail)
					
					
						2
						Multi line, configuration debug output (eg ps command used)
					
					
						3
						Lots of detail for plugin problem diagnosis
					
				
			
		
		
		Screen Output
		The plug-in should print the diagnostic and just the
		synopsis part of the help message.  A well written plugin would
		then have --help as a way to get the verbose help.
		Code and output should try to respect the 80x25 size of a
		crt (remember when fixing stuff in the server room!)
		
		
	    Return the proper status code
		See  below
		for the numeric values of status codes and their
		description. Remember to return an UNKNOWN state if bogus or
		invalid command line arguments are supplied or it you are unable
		to check the service.
		
		
		Plugin Return Codes
		The return codes below are based on the POSIX spec of returning
		a positive value.  Netsaint prior to v0.0.7 supported non-POSIX
		compliant return code of "-1" for unknown.  Nagios supports POSIX return
		codes by default.
		Note: Some plugins will on occasion print on STDOUT that an error
		occurred and error code is 138 or 255 or some such number.  These
		are usually caused by plugins using system commands and having not 
		enough checks to catch unexpected output.  Developers should include a
		default catch-all for system command output that returns an UNKNOWN
		return code.
		
		Plugin Return Codes
			
				
					
						Numeric Value
						Service Status
						Status Description
					
				
				
					
						0
						OK
						The plugin was able to check the service and it 
						appeared to be functioning properly
					
					
						1
						Warning
						The plugin was able to check the service, but it 
						appeared to be above some "warning" threshold or did not appear 
						to be working properly
					
					
						2
						Critical
						The plugin detected that either the service was not 
						running or it was above some "critical" threshold
					
					
						3
						Unknown
						Invalid command line arguments were supplied to the 
						plugin or the plugin was unable to check the status of the given 
						hosts/service
					
				
			
		
      
		
		
		Performance data
		Performance data is defined by Nagios as "everything after the | of the plugin output" -
		please refer to Nagios documentation for information on capturing this data to logfiles.
		However, it is the responsibility of the plugin writer to ensure the 
		performance data is in a "Nagios plugins" format.
		This is the expected format:
		
		'label'=value[UOM];[warn];[crit];[min];[max]
		
		Notes:
		
		space separated list of label/value pairs
			
		label can contain any characters
			
		the single quotes for the label are optional. Required if 
			spaces, = or ' are in the label
			
		label length is arbitrary, but ideally the first 19 characters
			are unique (due to a limitation in RRD). Be aware of a limitation in the
			amount of data that NRPE returns to Nagios
			
		to specify a quote character, use two single quotes
			
		warn, crit, min or max may be null (for example, if the threshold is 
			not defined or min and max do not apply). Trailing unfilled semicolons can be
			dropped
			
		min and max are not required if UOM=%
			
		value, min and max in class [-0-9.]. Must all be the
			same UOM
			
		warn and crit are in the range format (see 
			). Must be the same UOM
			
		UOM (unit of measurement) is one of:
			
			no unit specified - assume a number (int or float) 
				of things (eg, users, processes, load averages)
				
			s - seconds (also us, ms)
			% - percentage
			B - bytes (also KB, MB, TB)
			c - a continous counter (such as bytes
				transmitted on an interface)
			
			
		
		It is up to third party programs to convert the Nagios plugins 
		performance data into graphs.
		
	Translations
	If possible, use translation tools for all output. Currently, most of the core C plugins 
	use gettext for translation. General guidelines are:
	
	short help is not translated
	long help has options in English language, but text translated
	"Copyright" kept in English
	copyright holder names kept in original text
	
	
System Commands and Auxiliary Files
		Don't execute system commands without specifying their
		full path
		Don't use exec(), popen(), etc. to execute external
		commands without explicity using the full path of the external
		program.
		Doing otherwise makes the plugin vulnerable to hijacking
		by a trojan horse earlier in the search path. See the main
		plugin distribution for examples on how this is done.
		
		Use spopen() if external commands must be executed
	    If you have to execute external commands from within your
    	plugin and you're writing it in C, use the spopen() function
		that Karl DeBisschop has written.
		The code for spopen() and spclose() is included with the
		core plugin distribution.
		
		Don't make temp files unless absolutely required
		If temp files are needed, make sure that the plugin will
		fail cleanly if the file can't be written (e.g., too few file
		handles, out of disk space, incorrect permissions, etc.) and
		delete the temp file when processing is complete.
		
    	Don't be tricked into following symlinks
		If your plugin opens any files, take steps to ensure that
		you are not following a symlink to another location on the
		system.
		
		Validate all input
		use routines in utils.c or utils.pm and write more as needed
		
	
Perl Plugins
		Perl plugins are coded a little more defensively than other
		plugins because of embedded Perl.  When configured as such, embedded
		Perl Nagios (ePN) requires stricter use of the some of Perl's features.
		This section outlines some of the steps needed to use ePN
		effectively.
	  
		
			
			 Do not use BEGIN and END blocks since they will be called 
			only once (when Nagios starts and shuts down) with Embedded Perl (ePN).  In 
			particular, do not use BEGIN blocks to initialize variables.
			
	  
			To use utils.pm, you need to provide a full path to the
			module in order for it to work.
			
	  
	  e.g.
		use lib "/usr/local/nagios/libexec";
		use utils qw(...);
	  
	  		
			Perl scripts should be called with "-w"
	  		
			
			All Perl plugins must compile cleanly under "use strict" - i.e. at
			least explicitly package names as in "$main::x" or predeclare every
			variable. 
			
			Explicitly initialize each variable in use.  Otherwise with
			caching enabled, the plugin will not be recompiled each time, and
			therefore Perl will not reinitialize all the variables.  All old
			variable values will still be in effect.
	  		
			
			Do not use >DATA< handles (these simply do not compile under ePN).
	   		
			Do not use global variables in named subroutines. This is bad practise anyway, but with ePN the
			compiler will report an error "<global_var> will not stay shared ..". Values used by
			subroutines should be passed in the argument list. 
			
			If writing to a file (perhaps recording
			performance data) explicitly close close it.  The plugin never
			calls exit; that is caught by
			p1.pl, so output streams are never closed.
			
		
			As in  all plugins need 
			to monitor their runtime, specially if they are using network
			resources.  Use of the alarm is recommended
			noting that some Perl modules (eg LWP) manage timers, so that an alarm
			set by a plugin using such a module is overwritten by the module.
			(workarounds are cunning (TM) or using the module timer)
			Plugins may import a default time out ($TIMEOUT) from utils.pm.
			
			
			Perl plugins should import %ERRORS from utils.pm
			and then "exit $ERRORS{'OK'}" rather than "exit 0"
			
			
			
		
	  
Runtime Timeouts
		Plugins have a very limited runtime - typically 10 sec.
		As a result, it is very important for plugins to maintain internal
		code to exit if runtime exceeds a threshold. 
		All plugins should timeout gracefully, not just networking
		plugins. For instance, df may lock if you have automounted
		drives and your network fails - but on first glance, who'd think
		df could lock up like that.  Plus, it should just be more error
		resistant to be able to time out rather than consume
		resources.
		
		Use DEFAULT_SOCKET_TIMEOUT
		All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout
		
		
		Add alarms to network plugins
		If you write a plugin which communicates with another
		networked host, you should make sure to set an alarm() in your
		code that prevents the plugin from hanging due to abnormal
		socket closures, etc. Nagios takes steps to protect itself
		against unruly plugins that timeout, but any plugins you create
		should be well behaved on their own.
		
		
Plugin Options
	
		A well written plugin should have --help as a way to get 
		verbose help. Code and output should try to respect the 80x25 size of a
		crt (remember when fixing stuff in the server room!)
		
		Option Processing
		For plugins written in C, we recommend the C standard
		getopt library for short options. Getopt_long is always available.
		
		For plugins written in Perl, we recommend Getopt::Long module.
		Positional arguments are strongly discouraged.
		There are a few reserved options that should not be used
		for other purposes:
		
          -V version (--version)
          -h help (--help)
          -t timeout (--timeout)
          -w warning threshold (--warning)
          -c critical threshold (--critical)
          -H hostname (--hostname)
          -v verbose (--verbose)
		
		In addition to the reserved options above, some other standard options are:
		
          -C SNMP community (--community)
          -a authentication password (--authentication)
          -l login name (--logname)
          -p port or password (--port or --passwd/--password)monitors operational
          -u url or username (--url or --username)
		
	  
		Look at check_pgsql and check_procs to see how I currently
		think this can work.  Standard options are:
	  
		The option -V or --version should be present in all
		plugins. For C plugins it should result in a call to print_revision, a
		function in utils.c which takes two character arguments, the
		command name and the plugin revision.
		The -? option, or any other unparsable set of options,
		should print out a short usage statement. Character width should
		be 80 and less and no more that 23 lines should be printed (it
		should display cleanly on a dumb terminal in a server
		room).
		The option -h or --help should be present in all plugins.
		In C plugins, it should result in a call to print_help (or
		equivalent).  The function print_help should call print_revision, 
		then print_usage, then should provide detailed
		help. Help text should fit on an 80-character width display, but
		may run as many lines as needed.
		The option -v or --verbose should be present in all plugins.
		The user should be allowed to specify -v multiple times to increase
		the verbosity level, as described in .
    
    
      Plugins with more than one type of threshold, or with
      threshold ranges
      Old style was to do things like -ct for critical time and
      -cv for critical value. That goes out the window with POSIX
      getopt. The allowable alternatives are:
      
	
	  long options like -critical-time (or -ct and -cv, I
	  suppose).
	
	
	  repeated options like `check_load -w 10 -w 6 -w 4 -c
	  16 -c 10 -c 10`
	
	
	  for brevity, the above can be expressed as `check_load
	  -w 10,6,4 -c 16,10,10`
	
	
	  ranges are expressed with colons as in `check_procs -C
	  httpd -w 1:20 -c 1:30` which will warn above 20 instances,
	  and critical at 0 and above 30
	
	
	  lists are expressed with commas, so Jacob's check_nmap
	  uses constructs like '-p 1000,1010,1050:1060,2000'
	
	
	  If possible when writing lists, use tokens to make the
	  list easy to remember and non-order dependent - so
	  check_disk uses '-c 10000,10%' so that it is clear which is
	  the precentage and which is the KB values (note that due to
	  my own lack of foresight, that used to be '-c 10000:10%' but
	  such constructs should all be changed for consistency,
	  though providing reverse compatibility is fairly
	  easy).
	
      
      As always, comments are welcome - making this consistent
      without a host of long options was quite a hassle, and I would
      suspect that there are flaws in this strategy. 
      
    
Coding guidelines
	See GNU
	Coding standards for general guidelines.
	Comments
	You should use /* */ for comments and not // as some compilers
	do not handle the latter form.
	If you have copied a routine from another source, make sure the licence
	from your source allows this. Add a comment referencing the ACKNOWLEDGEMENTS
	file, where you can put more detail about the source.
	For contributed code, do not add any named credits in the source code 
	- contributors should be added into the THANKS.in file instead. 
	
	
	CVS comments
	When adding CVS comments at commit time, you can use the following prefixes:
	
	  - comment
	  
	    for a comment that can be removed from the Changelog
	  
	  
	  * comment
	  
	    for an important amendment to be included into a features list
	  
	  
	
	
	If the change is due to a contribution, please quote the contributor's name 
	and, if applicable, add the SourceForge Tracker number. Don't forget to 
update the THANKS.in file.
	
	Translations for developers
	To make the job easier for translators please follow these guidelines:
	
	  
	    before creating new strings, check the po/de.po file to see if a similar string
	    already exists
	  
	  
	    for help texts, break into individual options so that these can be reused
	    between plugins
	  
	
	
	Translations for translators
	To create an up to date list of translatable strings, run: tools/gen_locale.sh
	
Submission of new plugins and patches
	Patches
	If you have a bug patch, please supply a unified or context diff against the
	version you are using. For new features, please supply a diff against
	the CVS HEAD version.
	Patches should be submitted via 
	SourceForge's
	tracker system for Nagiosplug patches 
	and be announced to the nagiosplug-devel mailing list.
	Submission of a patch implies that the submmitter acknowledges that they
	are the author of the code (or have permission from the author to release the code)
	and agree that the code can be released under the GPL. The copyright for the changes will 
	then revert to the Nagios Plugin Development Team - this is required so that any copyright 
	infringements can be investigated quickly without contacting a huge list of copyright holders.
	Credit will always be given for any patches through a THANKS file in the distribution.
	
	New plugins
	If you would like others to use your plugins, please add it to
	the official 3rd party plugin repository, 
	NagiosExchange.
	
	We are not accepting requests for inclusion of plugins into 
	our distribution at the moment, but when we do, these are the minimum
	requirements:
	
      
	
	  Include copyright and license information in all files
	
	
	  The standard command options are supported (--help, --version,
	  --timeout, --warning, --critical)
	
	
	  It is determined to be not redundant (for instance, we would not 
		add a new version of check_disk just because someone had provide 
		a plugin that had perf checking - we would incorporate the features 
		into an exisiting plugin)
	
	
	  One of the developers has had the time to audit the code and declare
		it ready for core
	
	
	  It should also follow code format guidelines, and use functions from
utils (perl or c or sh) rather than using its own
	
	
	  Includes patches to configure.in if required (via the EXTRAS list if 
	  it will only work on some platforms)
	
	
	  If possible, please submit a test harness. Documentation on sample
	  tests coming soon