[Nagiosplug-devel] Please can I have GIT access

Jose Luis Martinez jlmartinez-lists-nagplug-devel at capside.com
Tue Sep 8 12:26:06 CEST 2009


> Also, one more thing to consider with local storage is when you have
> multiple nagios instances - either part of distributed monitoring,
> migrations of just for testing. In many case data accuracy rely on the
> fact the the same data is relative to the last check. If you have two
> nagios instances doing the same NRPE check, the scheduling may cause one
> check to get a very small interval of data. For example, with cpu usage
> check, that mean instead of getting the last 5 minutes, you may get only
> the last few seconds. You can easily miss a CPU hog because at the
> moment the check is executing the CPU was idle for the last few seconds,
> even if it was full the rest of the time (because the test/backup/old
> Nagios instancegot the rest of the interval data).

This is a problem that I've found with plugins that use their own 
storage for the checks. I've never had problems with multiple Nagios 
instances on one machine (I don't do that), but I have had it with the 
same check defined multiple times. Going along with the CPU example:

check_cpu --cpu 1
check_cpu --cpu 2

If the developer hasn't forseen that the plugin will be executed with 
different parameters, the readings for cpu 1 and 2 can get mixed.

Another case:

check_cpu --cpu 1 --display system,iowait
check_cpu --cpu 1 --display idle,irq

Maybe check_cpu is a bad example, but think about a plugin that can 
output LOTS of performance data (hundreds of data channels), and you 
want a couple of subsets output in separate checks.

The solution Nagios::Plugin::Differences applies is to let the developer 
choose an alternative temp file, but a couple of problems arise:

  - he has to be aware of the problem
  - even knowing about the problem, he can leave out a condition to 
select an alternative temp file.

This has made me change the Nagios::Plugin::Differences API to add a 
user specified "id" to the temp file generation bit. This adds a string 
to the temp file name so you can choose from what temp file to read and 
write to.

/tmp/_nagios_plugin_${script_basename}_${id}.tmp

One method I'm using is to MD5 all the params that the plugin recieves. 
That creates a "unique" string for the id part of the temp file (I'm 
aware of the collisions that can ocurre problem... but have no elegant 
solution for now).

> The fact that this method is nearly transparent make it even easier to
> fall into this pitfall and pretty hard to figure out the problem without
> knowing how the plugin actually work.

You're right. Leaving things to the developer can lead to these hard to 
diagnose problems.

> By comparison, when using performance data strings the stored data is
> bound to a single Nagios service on a single Nagios instance. The same
> check can run many times yet the plugins will *always* get it's last
> performance data string.

The problem, in my opinion, is that a plugin has no idea about what 
service check definition it is bound to, so it can't determine reliably 
the state of it's last execution.

> I don't care how it's implemented in the end, but I'm favor any method
> that can allow this kind of granularity without having to specifically
> think about it.

If Nagios / NRPE could just pass a unique ID for each service check 
definition, the plugins could use that by default to generate the 
tempfile name for their local storage. The unique ID could be a GUID, so 
that different Nagios instances would not generate the same IDs, thus 
solving the "multiple Nagios instances" problem too...

Just my 2 cents,

Jose Luis Martinez
jlmartinez at capside.com




More information about the Devel mailing list