|
Description
|
One of the processes doing I/O to zfs volume got stuck forever with the following stack
trace:
> ::ps!grep get
R 13530 23761 23761 622 86710 0x42004000 ffffffff8c0f6520 get
> ffffffff8c0f6520::ps -t
S PID PPID PGID SID UID FLAGS ADDR NAME
R 13530 23761 23761 622 86710 0x42004000 ffffffff8c0f6520 get
T 0xffffffff89c5d0c0 <TS_SLEEP>
> 0xffffffff89c5d0c0::findstack -v
stack pointer for thread ffffffff89c5d0c0: fffffe800013d9c0
[ fffffe800013d9c0 _resume_from_idle+0xde() ]
fffffe800013da00 swtch+0x185()
fffffe800013da30 cv_wait+0x6f(ffffffff8558cbec, ffffffff8558cb80)
fffffe800013dab0 zil_commit+0x8d(ffffffff8558cb80, 12353, 40)
fffffe800013db60 zfs_putapage+0x280(fffffe80d6984940, fffffffffa2283a0, 0, 0,
10000, ffffffffa89dc0c8)
fffffe800013dc20 pvn_vplist_dirty+0x342(fffffe80d6984940, 0, fffffffff427a33f
, 10000, ffffffffa89dc0c8)
fffffe800013dc70 zfs_inactive+0x72(fffffe80d6984940, ffffffffa89dc0c8)
fffffe800013dc90 fop_inactive+0x20(fffffe80d6984940, ffffffffa89dc0c8)
fffffe800013dcc0 vn_rele+0x66(fffffe80d6984940)
fffffe800013dd20 zfs_get_data+0x172(ffffffff85524a00, ffffffff87e74b88)
fffffe800013dd90 zil_lwb_commit+0x89(ffffffff8558cb80, ffffffff87e74b68,
ffffffffacf7c600)
fffffe800013de10 zil_commit+0x1b2(ffffffff8558cb80, 12345, 10)
fffffe800013de60 zfs_fsync+0x54(fffffe80c4d13bc0, 10, ffffffffa89dc0c8)
fffffe800013de90 fop_fsync+0x24(fffffe80c4d13bc0, 10, ffffffffa89dc0c8)
fffffe800013dec0 fdsync+0x3b(b, 10)
fffffe800013df10 sys_syscall32+0x101()
Note that zil_commit calls zil_commit() recursively.
Here is the zilog_t structure:
{
zl_lock = {
_opaque = [ 0 ]
}
zl_dmu_pool = 0xffffffff82a058c0
zl_spa = 0xffffffff82f68900
zl_header = 0xffffffff83d8c200
zl_os = 0xffffffff83daa368
zl_get_data = zfs_get_data
zl_itx_seq = 0x1235b
zl_ss_seq = 0x1235b
zl_destroy_txg = 0
zl_replay_seq = [ 0, 0, 0, 0 ]
zl_suspend = 0
zl_cv_write = {
_opaque = 0x2
}
zl_cv_seq = {
_opaque = 0
}
zl_stop_replay = 0
zl_stop_sync = 0
zl_writer = 0x1
zl_log_error = 0
zl_itx_list = {
list_size = 0x40
list_offset = 0
list_head = {
list_next = 0xffffffff8558cc08
list_prev = 0xffffffff8558cc08
}
}
zl_itx_list_sz = 0xc0
zl_cur_used = 0x5e8
zl_prev_used = 0x11f0
zl_lwb_list = {
list_size = 0xd0
list_offset = 0xb8
list_head = {
list_next = 0xffffffffacf7c6b8
list_prev = 0xffffffffacf7c6b8
}
}
zl_vdev_list = {
list_size = 0x20
list_offset = 0x10
list_head = {
list_next = 0xffffffff8558cc60
list_prev = 0xffffffff8558cc60
}
}
zl_clean_taskq = 0xffffffff8565fdb0
zl_dva_tree = {
avl_root = 0
avl_compar = 0
avl_offset = 0
avl_numnodes = 0
avl_size = 0
}
zl_destroy_lock = {
_opaque = [ 0 ]
}
}
Note that there are two waiters on zl_cv_write and zl_writer is set (as expected by the
previous zil_commit() upper on the stack. The code waits for zl_writer to be dropped:
for (;;) {
...
if (zilog->zl_writer == B_FALSE) /* no one writing, do it */
break;
cv_wait(&zilog->zl_cv_write, &zilog->zl_lock);
}
which will never happen because we are called by a writer. Hence a deadlock.
A side question - what is the second waiting thread? This is the bringover command which
is also waiting on the same condition variable:
stack pointer for thread ffffffffa6160b80: fffffe800122bac0
[ fffffe800122bac0 _resume_from_idle+0xde() ]
fffffe800122bb00 swtch+0x185()
fffffe800122bb30 cv_wait+0x6f()
fffffe800122bbb0 zil_commit+0x8d()
fffffe800122bc60 zfs_putapage+0x280()
fffffe800122bd20 pvn_vplist_dirty+0x342()
fffffe800122bd70 zfs_inactive+0x72()
fffffe800122bd90 fop_inactive+0x20()
fffffe800122bdc0 vn_rele+0x66()
fffffe800122be00 closef+0x7e()
fffffe800122bea0 closeandsetf+0x47f()
fffffe800122bec0 close+0x16()
fffffe800122bf10 sys_syscall32+0x101()
> ffffffffa6160b80::print kthread_t t_procp|::ps -t
S PID PPID PGID SID UID FLAGS ADDR NAME
R 23761 622 23761 622 86710 0x42004000 ffffffffa5ebaca0 bringover
T 0xffffffffa6160b80 <TS_SLEEP>
Note that zil_commit() calls cv_signal() but this seems fine as long as zil_commit()
always calls cv_signal() on return.
So, why is zil_commit() called recursively? In the end of zil_get_data() there is
VN_RELE(ZTOV(zp)). So vn_rele() sees that vp->v_count is one and calls VOP_INACTIVE()
which, in turn, calls zfs_inactive(). The zfs_inactive sees that the vnode has pages
and calls pvn_vplist_dirty():
/*
* Attempt to push any data in the page cache. If this fails
* we will get kicked out later in zfs_zinactive().
*/
if (vn_has_cached_data(vp))
(void) pvn_vplist_dirty(vp, 0, zfs_putapage, B_INVAL, cr);
This, in turn, calls zfs_putapage(), which calls zil_commit() again. And we have a
deadlock!
|