OpenSolaris

Printable Version Enter a New Search
Bug ID 6768607
Synopsis nfs wedged up when svc disabled on jurassic
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:nfsv4
Keywords
Responsible Engineer Dai Ngo
Reported Against
Duplicate Of
Introduced In solaris_10
Commit to Fix snv_112
Fixed In snv_112
Release Fixed solaris_nevada(snv_112)
Related Bugs 6762222
Submit Date 6-November-2008
Last Update Date 8-April-2009
Description
We hit bug 6762222 again on jurassic today.  To try to alleviate the symptoms we killed off
lockd and statd.  We believe that at the same time, an admin working at another location
was use 'svcadm disable' to disable the nfs service.

The net result, whatever the cause, was a wedged sharemgr process, pid 586013:

> 0t586013::pid2proc | ::walk thread | ::findstack
stack pointer for thread ffffff0a6dcf1e80: ffffff0044576aa0
[ ffffff0044576aa0 _resume_from_idle+0xf1() ]
  ffffff0044576ad0 swtch+0x200()
  ffffff0044576b70 turnstile_block+0x862()
  ffffff0044576be0 mutex_vector_enter+0x2a5()
  ffffff0044576c00 rfs4_clean_state_exi+0x1e()
  ffffff00445770b0 unexport+0x14f()
  ffffff0044577b70 exportfs+0x12ae()
  ffffff0044577bb0 stubs_common_code+0x51()
  ffffff0044577c00 nfs_export+0x9d()
  ffffff0044577c30 zfs_ioc_share+0x21b()
  ffffff0044577cb0 zfsdev_ioctl+0x133()
  ffffff0044577cf0 cdev_ioctl+0x4b()
  ffffff0044577d30 spec_ioctl+0x89()
  ffffff0044577db0 fop_ioctl+0x81()
  ffffff0044577eb0 ioctl+0x191()
  ffffff0044577f00 sys_syscall32+0x1fc()

sharemgr was patiently waiting to get the nfs4_state_lock:

> rfs4_state_lock::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffffffc050aed0 adapt ffffff003d7e7c80      -      -     yes

This lock seems to be held by:

> ffffff003d7e7c80::findstack  
stack pointer for thread ffffff003d7e7c80: ffffff003d7e7af0
[ ffffff003d7e7af0 _resume_from_idle+0xf1() ]
  ffffff003d7e7b20 swtch+0x200()
  ffffff003d7e7b50 cv_wait+0x77()
  ffffff003d7e7b80 rfs4_database_shutdown+0x7b()
  ffffff003d7e7ba0 rfs4_state_fini+0x6a()
  ffffff003d7e7bc0 nfs_srv_shutdown_all+0x44()
  ffffff003d7e7bd0 nfs_srv_stop_all+0xb()
  ffffff003d7e7bf0 svc_pool_cleanup+0x50()
  ffffff003d7e7c60 svc_thread_creator+0x2f9()
  ffffff003d7e7c70 thread_start+8()

The rfs4_database_shutdown routine is waiting, I think, for its reaper thread
or threads to shut down, and its called is again holding the big state lock.

I don't really know enough about NFS4 to dig in much further than this.

After waiting 10 minutes for an orderly shutdown, we gave up and took
a crash dump.  The dump is on jurassic-x4600 at /tank/dump/crash/jurassic-x4600/*.81.
Work Around
N/A
Comments
N/A