[Nagiosplug-help] Nagios crashes badly and takes out both machines!

Jonathan Soong jon.soong at imvs.sa.gov.au
Thu Apr 29 00:12:02 CEST 2004


Hi there

I'm looking for some help.

Over the last week my mail server and the machine monitoring it with 
Nagios has crashed 3 times at the same time.

I'm not sure if it is the Nagios machine crashing and taking my mail 
server with it somehow or the other way around.

In both situations i have seen increased load on my mail server, to the 
point of nrpe sending me a socket timeout warning. Shortly after this 
the machines become unusable and a hard-reboot is the only way to fix it.

When both machines crash (mailserver=Redhat 9, nagio=fedora), i've gone 
to the console on both machines and they are both filled with messages 
saying "status=0". This is on BOTH machines.

I'm running nrpe on the mailserver checking load, number of processes, 
disk space etc. The only anamolous thing is that i run my own plugin 
which i called check_ps which scans 'ps' for a given process (just so i 
know postfix is actually running!).

I was wondering if anyone could confirm whether or not it is Nagios that 
is crashing my machines???

Kind Regards

Jon

-- 
Jonathan Soong
Information Services
Institute of Medical and Veterinary Science (IMVS)
Email:   jon.soong at imvs.sa.gov.au
Web  :   http://www.imvs.sa.gov.au
Tel  :   +61 8 82223095
Fax  :   +61 8 82223147	





More information about the Help mailing list