[Nagiosplug-help] Return code 139 - check_by_ssh

C. Bensend benny at bennyvision.com
Tue Oct 29 15:08:02 CET 2002


Hey folks,

	I am attempting to test an HP-UX machine remotely
using the check_by_ssh plugin.  I use this plugin all over
the place on other Netsaint or Nagios installations, so I'm
really stumped with this one.

Nagios server:  OpenBSD 3.1-STABLE, OpenSSH 3.5p1, Nagios 1.0b6,
                plugins 1-3beta1
Remote server:  HP-UX 11.00, OpenSSH 3.5p1

>From checkcommands.cfg:

# 'check_mailq' command definition
define command{
        command_name    check_mailq
        command_line    $USER1$/check_by_ssh -H $HOSTADDRESS$ -C '/home/netsaint/check_mailq' -l netsaint
        }

>From services.cfg:

define service{
        use                             generic-service
        host_name                       mail1  
        service_description             MQUEUE
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              10
        normal_check_interval           3
        retry_check_interval            1
        contact_groups                  unix-admins
        notification_interval           none
        notification_period             24x7
        notification_options            w,c,r
        check_command                   check_mailq
        }

(sorry about the line wrap)


	Now, check_mailq on the remote machine is just a simple
shell script.  It is currently being used successfully by two
other installations, both Netsaint 0.0.6, both via check_by_ssh.

	The problem is a "Result code of 139 for check of
service 'MQUEUE' on 'mail1' was out of bounds." in the logs,
and a critical state within Nagios.

	Sooo, I check permissions, and I run it from the command
line, and everything looks grand.  I can ssh from the Nagios host
to the remote machine as user netsaint, via both hostname and IP.
I can execute the remote plugin via ssh as well as check_by_ssh.
I have checked the environment that Nagios uses to execute plugins,
and nothing looks funky there.  As far as I can recall, isn't
"return code 139" a segfault?  I can't even duplicate the segfault,
let alone determine the cause...

	I think I've checked all of the normal stuff that we see
(permissions, host keys not known, etc), but this one has me
stumped.  I've googled and searched the mailing list archives, and
haven't found a solution.  Can anyone give me a hand on this one?

Thanks folks!

Benny


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There's always time for Cheerios...




More information about the Help mailing list