[Nagiosplug-help] Fwd: Nagios and redundancy with mysql storage replication on a production environment

Ricardo David Martins ricardo.martins at gmail.com
Thu Aug 26 14:46:08 CEST 2004


i  need to implement a redundant nagios solution, with the "fail over
method". I have doubts concerning the real chances of implementing
this to all the specifications that i need to a redundant production

Basically i need that if one of two machines (with nagios) fail, the
other takes over. When the host/service/connection is established
again, that is the machine is up again, it can recover the information
(in case of the slave) sent from the ocsp commands from the master.

There are some problems concerning the solution presented in the
official documentation (for fail over). The first is if the slave
machine goes down it doesn't get synchronized right after.  The second
is the redundant hosts checks made by the slave machine.  The third is
if the master goes down, when it comes up again it also doesn't get
synchronized. Besides this there is also the problem of possibly
having both machines checking and reporting errors.

I thought a solution that deals with mysql replication. But there are
some issues that i don't know if they can be dealt by nagios.

Basically its two machines, master and slave, both running mySQL
server. Both are making replication on the databases of nagios (both
are master and slaves in this particular function). To make it
possible not to have conflicts between the databases, only one nagios
should be writing to the databases. When one fails (ex. the master),
the slave knows the operations and results and starts to send
notifications and to activate all hosts and services checks, while
master is down.

When master comes up, the mysql db synchronizes directly with the
nagios slave db (db are masters and slaves) and the master loads the
information and carries on, while the slave stops notifications and
all checks. There are some probable conflicts in this, like the
awaited checks that should be written on the database (i think they
would ) and also the awaiting checks that wouldn't arrive on the
master (i suppose the master wouldn't wait or the slave would also
have ocds commands reporting to the master so it - master - could
recover well). Of course the ocds commands on the slave would only be
operational in case it receives checks and the master is up. This is
only supposed to happen right after a master recovery.

When the master goes down, the ocds commands fail to report to the
slave, so when slave takes over, there should be some pending checks

Is nagios ready to stop all checks and notifications while he waits
for some kind of command to start this and also while receiving
information from the ocds commands of another machine it doesn't write
to the database (because the master is still up).

If nagios responds actively to what is written on the database, even
if it wasn't nagios to write it, there is no need for the ocds
commands and this solution would more synchronized than the one with

Is this possible? Is there any easier or better or more efficiently way?

Thanks, regards

Ricardo David Martins

More information about the Help mailing list