[Nagiosplug-help] Tracking down pthread/check_dns problem on CentOS4 w/ 1.4.2 plugins.

John P. Rouillard rouilj at cs.umb.edu
Mon Nov 28 09:27:16 CET 2005


Hello all:

I am running CentOS4, (RH Enterprise 4 public version) and I am seeing the 
dreaded:

  nslookup returned error status

problem. However the plugins I am using were compiled on this box. As
Ton Voon said:

> Are you using RedHat? There is a known problem with bind on RedHat
> where the nslookup and dig commands do not exit correctly due to a
> kernel pthread issue.

CentOS is "close enough" I guess 8-(.
  
> If you are using Redhat, this problem is fixed in nagios-plugins
> 1.4.2, but you need to compile it yourself for the ./configure script
> to pick up that your system has a problem and workaround it.

Seems like it doesn't work for CentOS and the kernel I am
running. Grepping through the sources for 1.4.2 doesn't show me a
reference to the pthread bug or a work around for it in
check_dns.c. However I came across the following Changelog entry:

2005-09-12 11:31  tonvoon

        * plugins/popen.c, Makefile.am, configure.in, config_test/Makefile,
          config_test/child_test.c, config_test/run_tests: ECHILD error at
          waitpid on Red Hat systems (Peter Pramberger and Sascha Runschke
          - 1250191)

A little more searching in plugins/popen.c turned up this segment of
code:

#ifdef REDHAT_SPOPEN_ERROR
        while (!childtermd);                                                    
        /* wait until SIGCHLD */
#endif

Now looking at configure to see where REDHAT_SPOPEN_ERROR is defined I
see it calling a grep "\.EL$" on "uname -r"'s output. The uname -r
output is "2.6.9-22.0.1.ELsmp" so this test is not done.

Correcting the configure script (deleted the $ closing achor) to allow
the test to be run I see it calling make to run "config_test/run_tests
10". If I run run_tests with an argument of 1000, I get Success=993
Fail=7 with "run_tests 10", I get a successfull completion better than
80% of the time leading to REDHAT_SPOPEN_ERROR being undefined.

Increasing the iterations and fixing the regexp so that
REDHAT_SPOPEN_ERROR is defined in config.h does seem to have solved
the problem.  However:

> Alternatively, Sascha Runschke has been working with Red Hat and it
> has been fixed in hotfix-kernel-2.6.9-22.12.EL, which you can
> probably request from them through your support contract.

I think I am seeing this problem in a java based application as
well. Searching through redhat's bugzilla hasn't lead me to the ticket
for this fix, does anybody have the kernel patch or a ticket ID so I
can see the actual problem and try to fix/verify it, or send it to the
CentOS folks for inclusion in a release/patch?

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.




More information about the Help mailing list