[Nagiosplug-help] check_disk hanging on bad nfs mount

Thomas Guyot-Sionnest dermoth at aei.ca
Thu Jan 24 04:50:03 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 23/01/08 03:05 PM, Mike Lindsey wrote:
> I'm running check_disk v1848 (nagios-plugins 1.4.11) on FreeBSD 6.0
> 
> I've got a bad nfs mount, which is causing check_disk to hang, leaving, 
> eventually, thousands of check_disk processes.
> 
> truss ./check_disk -vvvv results in:
> [...]
> For /logs, total=830472192, available=324530624, 
> available_to_root=324530624, used=505941568, fsp.fsu_files=15218898, 
> fsp.fsu_ffree=15168033
> write(1,0x8066000,141)                           = 141 (0x8d)
> For /logs, used_pct=61 free_pct=39 used_units=247041 free_units=158462 
> total_units=405504 used_inodes_pct=1 free_inodes_pct=99 
> fsp.fsu_blocksize=512 mult=1048576
> write(1,0x8066000,162)                           = 162 (0xa2)
> Freespace_units result=0
> write(1,0x8066000,25)                            = 25 (0x19)
> Freespace% result=0
> write(1,0x8066000,20)                            = 20 (0x14)
> Usedspace_units result=0
> write(1,0x8066000,25)                            = 25 (0x19)
> Usedspace_percent result=0
> write(1,0x8066000,27)                            = 27 (0x1b)
> Usedinodes_percent result=0
> write(1,0x8066000,28)                            = 28 (0x1c)
> Freeinodes_percent result=0
> write(1,0x8066000,28)                            = 28 (0x1c)
> calling stat on /host
> write(1,0x8066000,22)                            = 22 (0x16)
> 
> After which, it hangs.  My standard arguements just set it to check the 
> partition to see if it's mounted.
> 
> check_disk -w 20 -c 10 -e -A -L -X procfs -X devfs
> 
> Ideas, thoughts, workarounds or fixes?

That's normal behavior to hang on NFS when the server go away. All
process waiting for IO on the NFS will block until the server is back.
If you have a properly configures HA cluster NFS operations should
resume as well on failovers.

If you don't want this behavior, look in your nfs or mount manual for an
option to avoid this behavior. Here's what it says on Linux:

  soft           If an NFS file operation  has  a  major  timeout
                 then report an I/O error to the calling program.
                 The default is to  continue  retrying  NFS  file
                 operations indefinitely.

  hard           If  an  NFS  file  operation has a major timeout
                 then report "server not responding" on the  con‐
                 sole  and  continue retrying indefinitely.  This
                 is the default.

  intr           If an NFS file operation has a major timeout and
                 it  is  hard  mounted,  then  allow  signals  to
                 interupt the file  operation  and  cause  it  to
                 return   EINTR  to  the  calling  program.   The
                 default is to not allow file  operations  to  be
                 interrupted.

So for a Linux server, "-o soft" would fix it, or alternatively
"-o intr" would leave the processes behind but allow you to kill them.

Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmArr6dZ+Kt5BchYRApwPAJ0RTETAr7Zu7bfiYpXt1VNGNh18KACg0ncJ
Q+B9QAP5ElqSrO58gNR+8x8=
=vHOV
-----END PGP SIGNATURE-----




More information about the Help mailing list