[Nagiosplug-devel] [ nagiosplug-Bugs-1180762 ] check_ssh does not properly close connection

SourceForge.net noreply at sourceforge.net
Wed Jun 15 02:29:00 CEST 2011


Bugs item #1180762, was opened at 2005-04-11 16:41
Message generated for change (Comment added) made by chninkel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=1180762&group_id=29880

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General plugin execution
Group: Release (specify)
Status: Closed
Resolution: Wont Fix
Priority: 5
Private: No
Submitted By: M. Sean Finney (seanius)
Assigned to: Thomas Guyot-Sionnest (dermoth)
Summary: check_ssh does not properly close connection

Initial Comment:
with 1.4 and later, it looks like check_ssh doesn't
properly close connections.  for example, this is the
previous behaviour of check_ssh in the 1.3 series:

Apr 11 10:24:03 appsrv1 sshd[9822]: Connection closed
by xxx.xxx.64.52

but in 1.4:

appsrv1 sshd[10154]: fatal: Read from socket failed:
Connection reset by peer

i think this is just because close() isn't being
called.  i will verify this shortly...

----------------------------------------------------------------------

Comment By: Yann Rouillard (chninkel)
Date: 2011-06-15 02:29

Message:
After some reading, it seems using SO_LINGER socket option might be a
better way.

----------------------------------------------------------------------

Comment By: Yann Rouillard (chninkel)
Date: 2011-06-15 02:26

Message:
I can confirm this problem, it doesn't happen every time but often enough
to be reproduceable with launching

The randomness seems to be caused by some timing condition. 
As soon as openssh receive the agent string, it replies with some key
exchange data.
It seems that if the close happens after the nagios host has received the
data, the ssh server will receive a ECONNRESET and log the "Connection
reset by peer" message. 
If the close happens before the nagios host has received the data, the
connection is properly closed.

I made the following modification in check_ssh.c to be sure to flush the
incoming data before closing the socket and it solved the problem, no more
error log whatever the number of check_ssh call.



--- check_ssh.c	2011-06-15 02:24:31.000000000 +0200
+++ check_ssh.c.new	2011-06-15 02:24:04.000000000 +0200
@@ -45,7 +45,7 @@
 #endif
 
 #define SSH_DFL_PORT    22
-#define BUFF_SZ         256
+#define BUFF_SZ         1024
 
 int port = -1;
 char *server_name = NULL;
@@ -252,6 +252,7 @@
 			printf
 				(_("SSH WARNING - %s (protocol %s) version mismatch, expected
'%s'\n"),
 				 ssh_server, ssh_proto, remote_version);
+			recv (sd, output, BUFF_SZ, 0);
 			close(sd);
 			exit (STATE_WARNING);
 		}
@@ -259,6 +260,7 @@
 		printf
 			(_("SSH OK - %s (protocol %s)\n"),
 			 ssh_server, ssh_proto);
+		recv (sd, output, BUFF_SZ, 0);
 		close(sd);
 		exit (STATE_OK);
 	}


----------------------------------------------------------------------

Comment By: Pedro Albuquerque (pedroalb84)
Date: 2011-05-26 16:43

Message:
Hi all,

is there any fixes for this issue?

cheers.
Pedro

----------------------------------------------------------------------

Comment By: Christian (christian42)
Date: 2010-03-27 08:21

Message:
Sorry to revive such an old issue, but I don't think this is resolved yet.
I just built nagiosplugins-1.4.14 on Solaris 10/x86 and see the same old
issue:

---------------------------------------------------------
ray1# uname -a
SunOS ray1 5.10 Generic_141445-09 i86pc i386 i86pc
ray1# pwd
/home/c/nagios-plugins-1.4.14
ray1# file ./plugins/check_ssh
./plugins/check_ssh:    ELF 32-bit LSB executable 80386 Version 1,
dynamically linked, not stripped

ray1# ./plugins/check_ssh localhost
SSH OK - Sun_SSH_1.1.2 (protocol 2.0)

ray1# dmesg | tail -1
Mar 27 08:06:43 ray1 sshd[7169]: [ID 800047 auth.crit] fatal: Read from
socket failed: Connection reset by peer
---------------------------------------------------------

A similar installation on sparc (also with 1.4.14) shows the same result.

When running through truss(1) it seems that the socket /is/ being closed
though:

-------------------------------
10758/1:         0.0278 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "",
SOV_DEFAULT) = 3
10758/1:         0.0280 connect(3, 0x0002CF10, 16, SOV_DEFAULT)         =
0
[...]
10758/1:         0.0338 close(3)                                        =
0
-------------------------------

So, maybe it's something else (and not a missing close()) causing these
messages?
During the connect and when one is fast enough, for a second or so one can
see in netstat:

127.0.0.1.33748      127.0.0.1.22         49152      0 49152      0
FIN_WAIT_2
127.0.0.1.22         127.0.0.1.33748      49152      0 49152      0
CLOSE_WAIT

Any ideas how to debug this further?

Thanks,
Christian.

PS: Why was this closed as "Wont Fix" when clearly a change has been
commited?

----------------------------------------------------------------------

Comment By: Sergey Svishchev (shattered)
Date: 2007-11-06 09:13

Message:
Logged In: YES 
user_id=45207
Originator: NO

I'm using OpenSSH_3.8.1p1 FreeBSD-20060123 (shipped in FreeBSD 5.5) and
can reproduce it at will.

----------------------------------------------------------------------

Comment By: Thomas Guyot-Sionnest (dermoth)
Date: 2007-11-02 13:58

Message:
Logged In: YES 
user_id=375623
Originator: NO

Yes, emias made me realize that on IRC yesterday.

I looked into it and I won't fix this because:

1. I can't reproduce it on OpenSSH, even with DEBUG logging (What SSH
server/version are you using?)

2. There's no simple way to do that. It would at the very least require
implementing the key exchange part of the protocol; I didn't even look
further as this is way beyond the scope of this plugin.

I suggest that you rather look into your SSH daemon or logging daemon
configuration; or get this fixed with your ssh vendor.

----------------------------------------------------------------------

Comment By: Sergey Svishchev (shattered)
Date: 2007-11-02 06:32

Message:
Logged In: YES 
user_id=45207
Originator: NO

It's check_ssh, not check_by_ssh.

----------------------------------------------------------------------

Comment By: Thomas Guyot-Sionnest (dermoth)
Date: 2007-11-02 03:29

Message:
Logged In: YES 
user_id=375623
Originator: NO

There's no close in there, and no signs of seanius's commit. He either
forgot to commit or commited it to the wrong branch...

I'll take a look shortly. Since I never used check_by_ssh it'll help if
you can give me a sample command-ling and what to look for (In logs I
guess), so I won't have to reinvent the wheel :)

Thanks

----------------------------------------------------------------------

Comment By: Sergey Svishchev (shattered)
Date: 2007-10-31 15:34

Message:
Logged In: YES 
user_id=45207
Originator: NO

This is still a problem in 1.4.3 -- evidently, close() is not enough.

----------------------------------------------------------------------

Comment By: M. Sean Finney (seanius)
Date: 2005-04-11 20:07

Message:
Logged In: YES 
user_id=226838

yup, calling close() before exiting resolves this problem,
i've committed a change to cvs

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=1180762&group_id=29880




More information about the Devel mailing list