While testing my FMA bits, I tripped over an interesting deadlock:
T1 calls spa_scrub() with spa_namespace_lock held:
mutex_enter(&spa_namespace_lock)
spa_scrub()
spa_scrub_stop = 1
wait for spa_scrub_thread to stop
T2 calls spa_vdev_enter()
spa_vdev_enter()
spa_scrub_suspend()
spa_scrub_suspended++
mutex_enter(&spa_namespace_lock)
T3 is spa_scrub_thread():
spa_scrub_thread()
while (!spa_scrub_stop)
while (spa_scrub_suspended)
wait for suspending thread to release
At this point we have a three-way deadlock. There are a couple places
in the code where we call spa_scrub() with spa_namespace_lock held. My
FMA bits made this a requirement, because we now post a resilvering
sysevent in spa_scrub() and hence need the namespace lock.
There is no reason why spa_vdev_enter() can't take the namespace lock
before calling spa_scrub_suspend(). There are no other callers that
attempt to grab the namespace lock after calling spa_scrub_suspend().