[Nagiosplug-help] check_nt USEDDISKSPACE and SANs

Jim McNamara jim at packetalk.net
Fri Oct 17 23:17:01 CEST 2008

Previous message: [Nagiosplug-help] check_by_ssh execution of check_load always returns UNKNOWN
Next message: [Nagiosplug-help] Nagios Graph
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I'm having a problem with the check_nt plugin, specifically the
USEDDISKSPACE variable and what happens when a drive on a windows server
(that actually is a SAN device) is lost.

I have some windows servers that archive IP camera data. The data is
actually stored on a SAN unit using AOE. A company called rocketdivision
wrote software that allows windows machines to see and write to AOE
targets without doing any IP configuration, just like a linux box.

The software connects the SAN device as a drive to the windows server,
and it is always mounted as the v:\ drive. I have nagios checking the
drive without a problem, but earlier today the windows server lost
connection to the SAN, and the plugin checking the free space on the
drive never reported an error.

After I had everything functional again, I purposely failed the drive
through the software to simulate the failure again, just to see what the
plugin reported. For a few minutes it seems the nsclient has old values
cached, then once it checks for fresh data, it segfaults. Here are some
test results -

Test #1 with the SAN drive connected and functioning -
jim at hobarchive:~$ /usr/local/nagios/libexec/check_nt -H 192.168.102.14
-v USEDDISKSPACE -l v -p 12489 -w 80%
v:\ - total: 1397.28 Gb - used: 1299.31 Gb (93%) - free 97.97 Gb (7%) |
'v:\ Used Space'=1299.31Gb;1117.83;0.00;0.00;1397.28

Test #2 failed the SAN, presumably the value I was seeking was cached at
this point -
jim at hobarchive:~$ /usr/local/nagios/libexec/check_nt -H 192.168.102.14
-v USEDDISKSPACE -l v -p 12489
v:\ - total: 1397.28 Gb - used: 1297.93 Gb (93%) - free 99.36 Gb (7%) |
'v:\ Used Space'=1297.93Gb;0.00;0.00;0.00;1397.28

Test #3 about 2 minutes after Test #2, the nsclient probably refreshed
its data -
jim at hobarchive:~$ /usr/local/nagios/libexec/check_nt -H 192.168.102.14
-v USEDDISKSPACE -l v -p 12489
Segmentation fault

Looking at the history for this sevice on this host for the past 24
hours, I see the service went critical when the problem started, but
about 20 minutes later it returned to the warning state, which is the
normal state for this service. It was in warning for about 4 hours when
it should have been critical, as the entire v: drive didn't exist.

I'm using nagios version 3.0.3 with plugins 1.4.12, both compiled from
source. The nsclient is version 0.3.3.20.

What should I change to set it so if the plugin segfaults, I get a
critical email?

Here's the service definition -
define service{
use generic-service
host_name host1,host2,host3,host4
service_description v: drive space
check_command check_nt!USEDDISKSPACE!-l v -w 80 -c 94
}

If any additional info is needed, I'd be glad to offer it. Thanks for
any input that you have.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-plugins.org/archive/help/attachments/20081017/b449df2d/attachment.html>

Previous message: [Nagiosplug-help] check_by_ssh execution of check_load always returns UNKNOWN
Next message: [Nagiosplug-help] Nagios Graph
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Help mailing list