OpenSolaris

Printable Version Enter a New Search
Bug ID 6567983
Synopsis deadlock with spa_scrub_thread() and spa_namespace_lock
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:zfs
Keywords
Responsible Engineer Eric Schrock
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_68
Fixed In snv_68
Release Fixed solaris_nevada(snv_68) , solaris_10u6(s10u6_01) (Bug ID:2156387)
Related Bugs
Submit Date 9-June-2007
Last Update Date 29-April-2008
Description
While testing my FMA bits, I tripped over an interesting deadlock:

T1 calls spa_scrub() with spa_namespace_lock held:

	mutex_enter(&spa_namespace_lock)
	    spa_scrub()
		spa_scrub_stop = 1
		wait for spa_scrub_thread to stop

T2 calls spa_vdev_enter()

	spa_vdev_enter()
	    spa_scrub_suspend()
		spa_scrub_suspended++
	    mutex_enter(&spa_namespace_lock)

T3 is spa_scrub_thread():

	spa_scrub_thread()
	    while (!spa_scrub_stop)
		while (spa_scrub_suspended)
		    wait for suspending thread to release

At this point we have a three-way deadlock.  There are a couple places
in the code where we call spa_scrub() with spa_namespace_lock held.  My
FMA bits made this a requirement, because we now post a resilvering
sysevent in spa_scrub() and hence need the namespace lock.

There is no reason why spa_vdev_enter() can't take the namespace lock
before calling spa_scrub_suspend().   There are no other callers that
attempt to grab the namespace lock after calling spa_scrub_suspend().
Work Around
N/A
Comments
N/A