[Nagiosplug-help] Tracking down pthread/check_dns problem on CentOS4 w/ 1.4.2 plugins.
John P. Rouillard
rouilj at cs.umb.edu
Mon Nov 28 09:27:16 CET 2005
I am running CentOS4, (RH Enterprise 4 public version) and I am seeing the
nslookup returned error status
problem. However the plugins I am using were compiled on this box. As
Ton Voon said:
> Are you using RedHat? There is a known problem with bind on RedHat
> where the nslookup and dig commands do not exit correctly due to a
> kernel pthread issue.
CentOS is "close enough" I guess 8-(.
> If you are using Redhat, this problem is fixed in nagios-plugins
> 1.4.2, but you need to compile it yourself for the ./configure script
> to pick up that your system has a problem and workaround it.
Seems like it doesn't work for CentOS and the kernel I am
running. Grepping through the sources for 1.4.2 doesn't show me a
reference to the pthread bug or a work around for it in
check_dns.c. However I came across the following Changelog entry:
2005-09-12 11:31 tonvoon
* plugins/popen.c, Makefile.am, configure.in, config_test/Makefile,
config_test/child_test.c, config_test/run_tests: ECHILD error at
waitpid on Red Hat systems (Peter Pramberger and Sascha Runschke
A little more searching in plugins/popen.c turned up this segment of
/* wait until SIGCHLD */
Now looking at configure to see where REDHAT_SPOPEN_ERROR is defined I
see it calling a grep "\.EL$" on "uname -r"'s output. The uname -r
output is "2.6.9-22.0.1.ELsmp" so this test is not done.
Correcting the configure script (deleted the $ closing achor) to allow
the test to be run I see it calling make to run "config_test/run_tests
10". If I run run_tests with an argument of 1000, I get Success=993
Fail=7 with "run_tests 10", I get a successfull completion better than
80% of the time leading to REDHAT_SPOPEN_ERROR being undefined.
Increasing the iterations and fixing the regexp so that
REDHAT_SPOPEN_ERROR is defined in config.h does seem to have solved
the problem. However:
> Alternatively, Sascha Runschke has been working with Red Hat and it
> has been fixed in hotfix-kernel-2.6.9-22.12.EL, which you can
> probably request from them through your support contract.
I think I am seeing this problem in a java based application as
well. Searching through redhat's bugzilla hasn't lead me to the ticket
for this fix, does anybody have the kernel patch or a ticket ID so I
can see the actual problem and try to fix/verify it, or send it to the
CentOS folks for inclusion in a release/patch?
My employers don't acknowledge my existence much less my opinions.
More information about the Help