[Nagiosplug-devel] RFC: Nagios 3 and Embedded Perl Plugins

Florian Gleixner flo at bier.homeip.net
Thu Jan 4 13:07:50 CET 2007


Andreas Ericsson wrote:
> Florian Gleixner wrote:
>> Andreas Ericsson wrote:
>>> Florian Gleixner wrote:
>>>> True, leaks and crashes could make nagios more unstable. dl-plugins
>>>> should be used with care. "Worker threads" could isolate some of the risk.
>>>>
>>>> The performance gain is simply the time a C plugin needs to create a
>>>> process. You could say, that this is not very much time, but some nagios
>>>> setups make thousands of checks per minute. Here is a very simple test:
>>>> The bash has the echo command build in. On most linux systems you will
>>>> find a /bin/echo program with same functionality too. So compare:
>>>>
>>>> time for ((i=0 ; i< 10000 ; i++)) ; do echo bla ; done
>>>> real    0m1.536s
>>>> user    0m0.172s
>>>> sys     0m0.020s
>>>>
>>>> time for ((i=0 ; i< 10000 ; i++)) ; do /bin/echo bla ; done
>>>> real    0m34.047s
>>>> user    0m8.761s
>>>> sys     0m15.365s
>>>>
>>>> I think some default plugins like ping or tcp-check could be made as dl
>>>> module, the more complicated or the plugins that are usually executed at
>>>> the monitored nodes should be "normal" plugins.
>>>>
>>>> I never had a look at the nagios code, it was just a idea popping up.
>>>>
>>> A lower hanging apple is to make Nagios use fork() / execve() instead of 
>>> using popen(), which does a double fork() / exec() thing.
>>>
>> or use the popen() call from popen.{h,c} from the nagios plugins.
> 
> 
> That doesn't leave room for passing the environment though, which will 
> break a very valuable feature in Nagios atm. Btw, popen.[hc] have been 
> replaced by runcmd.[hc]. How old a version are you running?
> 

1.4.5 has some spopen calls in check_fping.c check_hpjd.c check_load.c
check_ping.c check_procs.c check_snmp.c check_swap.c and check_users.c.

> 
>> The nagios plugins also call external programs via this call. So at the
>> moment one plugin check usually creates a shell process, the plugin
>> executable process and if the plugin creates a process we have three
>> process created for one simple ping.
> 
> No, there is the fork()/execve() in nagios (done through popen(3)) which 
> spawns a shell. Then there's the fork()/execve() in the shell, and 
> finally the plugin is run, so it's always three processes per plugin 
> invocation. If the plugin spawns fe /bin/ps or /bin/df we have four 
> processes for one plugin.
> 
>> Ideally a dynamically loaded plugin, that does not call external
>> programs but has the code of for example "ping" complied in, does not
>> create a single process.
>>
> 
> This is a Bad Idea beacuse the core program can't block on read()'s, 
> which means all plugins that work over the network will have their 
> timing values skewed unless you run each check in a separate thread or 
> fork() a new nagios daemon for each check to run dynamically, in which 
> case you've already lost 90% of the gain and ended up with a wicked 
> burden of maintainability. That's without considering the initial cost 
> (in developer time) to rewrite all plugins to never use signals[1] (or 
> alarm(3)), which will be huge.

True. But i think a threaded approach could give a huge performance
boost. Aren't alarms mostly used to timeout a external call?
But true: it would cost very much initially to rewrite that all. I think
nobody wants to do that without a need.

> 
> Also, for PING checks you're opening a new can of worms, since 
> implementing the ICMP protocol generally requires access to raw sockets, 
> which is, on almost all systems, restricted to the super-user. It's 
> possible to work around this by obtaining one[2] raw socket prior to 
> dropping the root privileges at startup, but then you'd be up for a 
> fairly complex ping program that needs to keep track of all the hosts 
> that currently has echo requests pending and assign each response to the 
> right check.
> 

Got me. Yes. ICMP is a problem.
> 
> [1] All module-based checks would want to catch the same signals, so the 
> signal-handlers would be overwritten. alarm(3) is sometimes implemented 
> through signals, so that's not usable.
> 
> [2] Obtaining one socket per ping-check at start-up and keeping them is 
> not feasible, since most systems normally only allow 1024 
> file-descriptors / process.
> 





More information about the Devel mailing list