[Nagiosplug-devel] RFC: Nagios 3 and Embedded Perl Plugins

Andreas Ericsson ae at op5.se
Tue Jan 9 14:38:55 CET 2007


Thomas Guyot-Sionnest wrote:
> On 08/01/07 07:02 AM, Andreas Ericsson wrote:
>> Stéphane Urbanovski wrote:
>>> Andreas Ericsson a écrit :
>>>
>>>> But you just said to load this newfangled dream-version of nrpe as a 
>>>> module? That sort of microsoft'ish thinking leads to "integrated" and 
>>>> very unstable code I'm afraid.
>>> (Ok, my english is really poor ...)
>>>
>>> Not the "newnrpe", wich is a separate process, but only the communication with newnrpe part
>>>
>> Ah, I see what you mean now. I'm afraid that fairly drastically reduces 
>> the scalability of Nagios. Assume for a second that you have 1500 hosts 
>> to monitor, all of which use NRPE for checking local stuff. Keeping up 
>> the connection with those 1500 hosts requires 1500 open file-descriptors 
>> at all times. Most systems can have a lot more files than that open per 
>> process at any given time, but there is still a hard limit lurking 
>> somewhere which means Nagios can no longer check an arbitrary number of 
>> hosts and services. The worst part is that that hard limit will be set 
>> differently on different systems.
>>
>> I'm afraid you'll find that this just isn't useful enough to warrant the 
>> massive developer effort it would take to write it and seeing as you're 
>> the only one arguing your case, you'd have to write it yourself to get 
>> it implemented. Either way, further discussion is fairly pointless until 
>> you have some code available.
> 
> Actually I think now it's getting interesting. If done properly, this
> could be a nice way of doing distributed active checking.
> 
> Using the same system Stéphane described Nagios could have open
> connections to remote execution hosts that runs the checks and read back
> results. Different services properties would determine if the service
> can be run directly on the host (if Nagios has an open connection to it)
> or if it has to be remote. Check execution load could be run on
> dedicated servers, or even be spread out across monitored hosts.
> 


Yes, but a distributed static mesh redundancy thing is pretty different 
from an NRPE-daemon with an option to keep connections alive. A nice 
example of where "think big" doesn't work, but "think bigger" does.

I'm working on a module that does just that, but it requires a fullblown 
Nagios installation on each of the poller nodes and the decision of 
which host is monitored by what system is determined by hostgroups 
instead of through some automagic solution that could possibly (and 
would probably) get things wrong from time to time.

> On big setups this had the clear advantage of scalability, but on
> smaller setups it can also be interesting as one could use very cheap
> servers for running the Nagios daemon in HA, and provide redundancy by
> spreading the checks across monitored servers themselves.
> 

Yup. That's the plan. Especially the scalability bit. The idea is to get 
an infinite number of layers of pollers/masters, as each poller can in 
turn have pollers connected to it. Now I'm just hoping we'll release it 
publicly soon so I can get the nagios community to test and patch it for 
me while I lounge and drink beer ;-)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231




More information about the Devel mailing list