|
Description
|
yesterday i went to bfu my desktop to nightly bits and while the bfu script was
trying to shutdown an lx branded zone the machine paniced and rebooted. here's
how we died:
---8<---
> ::status
debugging crash dump vmcore.3 (64-bit) from mcescher
operating system: 5.11 onnv-gate:2006-09-12 (i86pc)
panic message:
mutex_enter: bad mutex, lp=ffffffffabdf2800 owner=deadbeefdeadbee8 thread=ffffffffa8ca0e80
dump content: kernel pages and pages from PID 20640
> $c0
vpanic()
mutex_panic+0x73()
mutex_vector_enter+0x536()
lxpr_getnode+0x2bd()
lxpr_lookup_common+0x7b()
lxpr_lookup_piddir+0x53()
lxpr_lookup+0xe9()
fop_lookup+0x53()
lookuppnvp+0x2e5()
lookuppnat+0x125()
lookupnameat+0x82()
cstatat_getvp+0x160()
cstatat64_32+0x7d()
stat64_32+0x31()
sys_syscall32+0x1ff()
---8<---
a crash dump can be found at:
/net/mcescher.eng/export/crash/6475483
we died here while attempting to aquire v_lock (called from VN_HOLD)
---8<---
lxpr_getnode()
case LXPR_PID_CURDIR:
ASSERT(p != NULL);
up = PTOU(p);
lxpnp->lxpr_realvp = up->u_cdir;
ASSERT(lxpnp->lxpr_realvp != NULL);
VN_HOLD(lxpnp->lxpr_realvp);
---8<---
the reason we died is because the vnode we're trying to acces has
been freed:
---8<---
> ::offsetof vnode_t v_lock
offsetof (vnode_t, v_lock) = 0
> $c1 ! grep mutex_vector_enter
mutex_vector_enter+0x536(ffffffffabdf2800)
> ffffffffabdf2800::whatis
ffffffffabdf2800 is ffffffffabdf2800+0, bufctl ffffffffabb8a148 freed from vn_cache
> ffffffffabb8a148::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffffffabb8a148 ffffffffabdf2800 122bb12a04d26 fffffe80008fec80
ffffffff841d2008 ffffffff812c10c0 ffffffff81cd7320
kmem_cache_free_debug+0x131
kmem_cache_free+0x4e
vn_free+0x9f
zfs_znode_cache_destructor+0x74
kmem_cache_free_debug+0x1ee
kmem_cache_free+0x4e
zfs_znode_free+0x53
znode_pageout_func+0x60
dbuf_evict_user+0x60
dbuf_clear+0x57
dbuf_evict+0x74
dnode_destroy+0x97
dnode_buf_pageout+0xca
dbuf_evict_user+0x60
dbuf_clear+0x57
---8<---
so what's the process that caused the panic:
---8<---
> *panic_thread::print kthread_t t_procp | ::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 20640 20566 19393 19393 0 0x42004000 fffffea11810d3b8 fuser
---8<---
and since the access was being done via linux /proc, what's that process
that fuser was trying to look at:
---8<---
> $c3 ! grep lxpr_getnode
lxpr_getnode+0x2bd(fffffe80dd7c52c0, 4, fffffe8b7b8313b8)
> fffffe8b7b8313b8::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
Z 2450 1 2450 2450 2 0x52000d02 fffffe8b7b8313b8 atd
---8<---
ah. a zombie process. that means that there isn't going to be any threads
or a working directory associated with this process.
so if we look at the behavior of /proc in a linux zone and on a native
linux machine we notice some differences. for reference, here's how proc
files for a zombie process on a linux machine behave:
---8<---
edp@lucea$ ls /proc/20418
cmdline cwd@ exe@ maps mounts stat status
cpu environ fd/ mem root@ statm
edp@lucea$ ls /proc/20418/fd
ls: /proc/20418/fd: Permission denied
edp@lucea$ ls -l /proc/20418
ls: cannot read symbolic link /proc/20418/cwd: Permission denied
ls: cannot read symbolic link /proc/20418/root: Permission denied
ls: cannot read symbolic link /proc/20418/exe: Permission denied
total 0
-r--r--r-- 1 root root 0 Oct 27 20:43 cmdline
-r--r--r-- 1 root root 0 Oct 27 20:43 cpu
lrwxrwxrwx 1 root root 0 Oct 27 20:43 cwd
-r-------- 1 root root 0 Oct 27 20:43 environ
lrwxrwxrwx 1 root root 0 Oct 27 20:43 exe
dr-x------ 2 root root 0 Oct 27 20:43 fd/
-r-------- 1 root root 0 Oct 27 20:43 maps
-rw------- 1 root root 0 Oct 27 20:43 mem
-r--r--r-- 1 root root 0 Oct 27 20:43 mounts
lrwxrwxrwx 1 root root 0 Oct 27 20:43 root
-r--r--r-- 1 root root 0 Oct 27 20:43 stat
-r--r--r-- 1 root root 0 Oct 27 20:43 statm
-r--r--r-- 1 root root 0 Oct 27 20:43 status
edp@lucea$ cat /proc/20418/cmdline
edp@lucea$ cat /proc/20418/cpu
cpu 0 0
cpu0 0 0
cpu1 0 0
edp@lucea$ cat /proc/20418/cwd
cat: /proc/20418/cwd: Permission denied
edp@lucea$ cat /proc/20418/environ
cat: /proc/20418/environ: Permission denied
edp@lucea$ cat /proc/20418/maps
cat: /proc/20418/maps: Permission denied
edp@lucea$ cat /proc/20418/mem
cat: /proc/20418/mem: Permission denied
edp@lucea$ cat /proc/20418/mounts
cat: /proc/20418/mounts: Invalid argument
edp@lucea$ cat /proc/20418/root
cat: /proc/20418/root: Permission denied
edp@lucea$ cat /proc/20418/stat
20418 (zombie.Linux.i3) Z 20411 20411 20196 34817 20411 4194372 8 0 2 0 0 0 0 0 20 0 0 0 252881831 0 0 4294967295 0 0 0 0 0 0 0 0 0 3222461040 0 0 17 1 0 0 0 0 0 0
edp@lucea$ cat /proc/20418/statm
0 0 0 0 0 0 0
edp@lucea$ cat /proc/20418/status
Name: zombie.Linux.i3
State: Z (zombie)
Tgid: 20418
Pid: 20418
PPid: 20411
TracerPid: 0
Uid: 89769 89769 89769 89769
Gid: 10 10 10 10
FDSize: 0
Groups: 10
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
---8<---
the way that lx_proc behaves is differentin a few cases. all these cases
won't be fixed with this bug. the fix for this bug will address the panic
seen when accessing "cwd" and it will fix the error returned when accessing
the following symlinks: "cwd", "root", "exe".
|