[Nagiosplug-devel] Patch for persistent data on plugins

Thomas Guyot-Sionnest dermoth at aei.ca
Fri Sep 18 09:17:31 CEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 17/09/09 07:04 AM, Alain Williams wrote:
> On Thu, Sep 17, 2009 at 02:05:52AM -0400, Thomas Guyot-Sionnest wrote:

Thanks for quickly addressing these issues!

>> My take on this...:
>>
>> 1. Tests should use libtap,
> 
> I didn't see that, it is not referenced in an appropriate README.
> I'll look at it.

Use --enable-libtap  on the configure line if you don't have it
installed. Libtab tests are in /lib/tests/.

>> and lets remove those goto's before someone
>> else see them ;)
> 
> There are no gotos in state_save.c
> There are a few in the test program and in check_procs.c -- they are used
> how a goto should be used -- in an error state and to increase clarity; one target
> in a function does not confuse and avoids excessive nested {}.
> Summary: without goto the code becomes more complicated/difficult-to-read.

Ok... well actually I believe code should be much simpler anyway... see
below.

>> 2. Doc should go in the docbook, not a plain text file (if there are too
>> technical portions those could be be code comments
> 
> I cannot see anything called docbook in the plugins tar file ?

in /docs. This is called the "developer guidelines" and is in docbook
format... don't bother too much with if though if you're uncomfortable
with docbook, it's mostly a copy-paste + adding some formatting job.

>> 3. A plugin shouldn't share state file for multiple different
>> invocations. Plugins may run in parallel so this is asking for trouble
>> (one invocation overwriting another plugin's data). That also call for a
>> much simpler API and plugin code.
> 
> That is why the functions have the 'instance' argument, from my documentation:
> 
>   name      This should be the name of the program
>   instance  If the program may be run several times (eg to monitor different
>             disks) this should contain a way of distinguishing the different
>             instances. If this is not needed it should be NULL.
>   Neither name not instance should contain spaces or slashes ('/').
> 
> An instance may be 'sda', 'sdb', ... for something that is monitoring disks.
> 
> I have added a bit more to the documentation of np_state_save_start().

Yes, but that's not how you apparently used it in check_procs (or did
you?- maybe it's for storing multiple invocations of the same data though).

I'll have a deeper look at that if you don't, but I feel both the API
and the plugin code could be simpler... I was thinking of something that
store a single blob (no matter if it's binary or ascii, although one
function could exist for each) and the plugin could give a meaning to
this "blob" (records, serialisation, etc.).

I'm not a heavy check_proc user but I was under the impression it checks
one thing at time, so there shouldn't be more tan one set of values to
store per invocation. Am I wrong?


>> 4. Should we write a temp file first and move it over instead of
>> truncating it? IMHO the former is much better is the file is not opened
>> and saver at once...
> 
> That could be done. I don't think that it worth the extra complexity.
> There will only be an error if something really nasty happens (eg out of
> disk space). If state is lost then a warning may be lost, but if a disk
> has gone full - worse things are afoot.
> If a state file is corrupt then the plugin will probably rewrite it with
> fresh/clean state information.
> 
> I won't do anything without further discussion.

Atomicity. Although it shouldn't happen, if the same plugin instance is
invoked twice then worst case if that one run's data will be overwritten
by the other (should give pretty much the same result if it's the same
check). Without it, both plugin will write their own data in the same
file, which is likely to cause corruption.

This is less a problem if the open and write is triggered by a single
function call (which is not the case afaik), but there's still place for
a race.

> It might make sense if the nagios plugin state files were removed on
> nagios start (but not reload) anyway: discuss.

Not IMHO. Even the perfdata method retains data with state retention...

Some people restart nagios *very* often too, that would be an issue for
them.

>> 8. I don't think the descripting/comments should go in there. They could
>> be added manually or trough a serialization library, but in most cases
>> the pluginj's verbose mode should be enough to debug.
> 
> That is if you understand what is generating the file. Remember that people who
> are new to this have problems understanding how it all hangs together.
> I found the comments useful when debugging check_procs.
> Typically the comments will be just: plugin name & command line args; date in human
> understandable format; a DO NOT EDIT line. The over head is 100-200 bytes.
> (See later discussion)

Well, my fear is that more space might be occupied by the header and
comments than by the data itself In some plugins all I need is the time
and an integer; this is no more than 8 bytes in binary and probably no
more than 16 in ascii/hex.

... and as explained making sure we're as small as possible will ensure
best performance, either through the use of tail files (much faster with
less space waste on FS that support it or less waste of memory on tmpfs
filesystems (RAM).

>> 9. On some unix unlink() may damage file system if the path is a
>> directory and the plugin is run as root. Any sysadmin ending up in that
>> situation should be shot in the head, but it's an easy check...
> 
> I use unlink() in np_state_destroy_save() to remove the state file. This is
> not a directory. I have modified the internal function generate_filename()
> to check & return an error if: name is empty or: name or instance contain
> a space or '/'. That will prevent trying to unlink a directory.
> 
> Changes in my private version that I will repost when appropriate.

I know it's dumb, but my fear was:

rm path//to/state_file; mkdir /path/to/state_file
then run the plugin...

Remember murphy's law: if there's the slightest possibility of something
going wrong it will happen. :)


>> 12. I'd suggest having all files prefixed with "np_" instead of using
>> some sort of heuristic to determine whenever we should add
>> "nagios-plugin_" (bonus point: making prefix a configure parameter).
> 
> That is only if DIR_DEFAULT is used, ie it is in /tmp.
> Making it a configure option (your point 10) fixes this (ie it is
> the same thing).
> 
> One point that you did NOT make that I thought you might is my use of
> the environment variable NAGIOS_PLUGIN_STATE_DIRECTORY. I can't see
> a simple way of avoiding this without adding complexity to the plugins
> (which is something that I wanted to avoid). It would be nice if there
> was some nagios support for this -- ie a nagios config parameter that
> makes it set this env var, rather than having to set this in the
> environment before calling nagios.

My point is that having a prefix everywhere (but a smaller one) won't
hurt, and it won't use different name whenever you leave it the default
of change it. I didn't like that the the file name was changing
*implicitely*.

>> 13. I like the ascii header, but I'd use a more compact version. This
>> ensure state files are as small as possible, preventing waste (very
>> useful when a tmpfs is used for storage and helping usage of tail files
>> on fs that support it).
> 
> Hmmm, it is a trade off between compactness and ease of understanding by the
> nagios newbie system admin. A quick look at those generated by the test
> program shows that the header length is less than 180 chars.
> For those that have not seen it we are talking about a header like:

If this is done properly the newbiew will not have to worry about
whatever is in the state file. It's probably better actually to make
sure they won't mess with it because it appears they can!

> 	I Nagios Plugin State File
> 	# This is generated by the plugin - DO NOT HAND EDIT
> 	# Plugin id: test instance: array 
> 	V 1
> 	# File written: Fri Sep 11 18:33:50 2009
> 	T 4aaa89fe
> 	B 1
> 	N 11
> 
> Removing the comments takes that down to 52 bytes (from 179).
> I could do something like testing for env var
> NAGIOS_PLUGIN_STATE_NO_COMMENT and suppress the comments,
> but is that worth it ?

Not just the comments... I would compact the header in one small line..
just enough readable to recognise/decode it manually without scarifying
much space for it.

> ''Nagios Plugin State File'' is 24 chars & could be shortened,
> I made is explicit so that if a sysadmin found the file and asked
> the question ''what is this file ?'' that it would be an easy answer.

What about adding a file(1) signature? Besides, it should be stored in a
 nagios-related directory...

>> 14. There's no additional tests in check_procs.t
> 
> Hmmmm: what is the purpose of the extra checks ?
> I could check parsing of command line args. Checking of saved state
> is really difficult (what is happening on the system that the test is
> being run on ?).
> The main thing to test is the save/restore of test data - this is what
> the plugin_save_test.c does.

It's a real-world test, making sure the plugin runs fine with it. Your
library tests don't test individual plugin's data save and restore
routines. for instance.

- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKszQL6dZ+Kt5BchYRAtxLAJ0SZvOswSRLyZc6mxI6PG18VqDLYQCg1c76
bMAp0kaLsYIQ1Hgd/YulgtI=
=TxfR
-----END PGP SIGNATURE-----




More information about the Devel mailing list