OpenSolaris

Printable Version Enter a New Search
Bug ID 6475483
Synopsis mutex_enter() panic in lxpr_getnode()
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:brandz
Keywords
Responsible Engineer Edward Pilatowicz
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_56
Fixed In snv_56
Release Fixed solaris_nevada(snv_56)
Related Bugs
Submit Date 26-September-2006
Last Update Date 25-January-2007
Description
yesterday i went to bfu my desktop to nightly bits and while the bfu script was
trying to shutdown an lx branded zone the machine paniced and rebooted.  here's
how we died:
---8<---
> ::status
debugging crash dump vmcore.3 (64-bit) from mcescher
operating system: 5.11 onnv-gate:2006-09-12 (i86pc)
panic message:
mutex_enter: bad mutex, lp=ffffffffabdf2800 owner=deadbeefdeadbee8 thread=ffffffffa8ca0e80
dump content: kernel pages and pages from PID 20640
> $c0
vpanic()
mutex_panic+0x73()
mutex_vector_enter+0x536()
lxpr_getnode+0x2bd()
lxpr_lookup_common+0x7b()
lxpr_lookup_piddir+0x53()
lxpr_lookup+0xe9()
fop_lookup+0x53()
lookuppnvp+0x2e5()
lookuppnat+0x125()
lookupnameat+0x82()
cstatat_getvp+0x160()
cstatat64_32+0x7d()
stat64_32+0x31()
sys_syscall32+0x1ff()
---8<---
a crash dump can be found at:
	/net/mcescher.eng/export/crash/6475483
we died here while attempting to aquire v_lock (called from VN_HOLD)
---8<---
lxpr_getnode()

	case LXPR_PID_CURDIR:
		ASSERT(p != NULL);
		up = PTOU(p);
		lxpnp->lxpr_realvp = up->u_cdir;
		ASSERT(lxpnp->lxpr_realvp != NULL);
		VN_HOLD(lxpnp->lxpr_realvp);
---8<---


the reason we died is because the vnode we're trying to acces has
been freed:
---8<---
> ::offsetof vnode_t v_lock
offsetof (vnode_t, v_lock) = 0
> $c1 ! grep mutex_vector_enter
mutex_vector_enter+0x536(ffffffffabdf2800)
> ffffffffabdf2800::whatis
ffffffffabdf2800 is ffffffffabdf2800+0, bufctl ffffffffabb8a148 freed from vn_cache
> ffffffffabb8a148::bufctl -v
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffffffabb8a148 ffffffffabdf2800    122bb12a04d26 fffffe80008fec80
                 ffffffff841d2008 ffffffff812c10c0 ffffffff81cd7320
                 kmem_cache_free_debug+0x131
                 kmem_cache_free+0x4e
                 vn_free+0x9f
                 zfs_znode_cache_destructor+0x74
                 kmem_cache_free_debug+0x1ee
                 kmem_cache_free+0x4e
                 zfs_znode_free+0x53
                 znode_pageout_func+0x60
                 dbuf_evict_user+0x60
                 dbuf_clear+0x57
                 dbuf_evict+0x74
                 dnode_destroy+0x97
                 dnode_buf_pageout+0xca
                 dbuf_evict_user+0x60
                 dbuf_clear+0x57
---8<---


so what's the process that caused the panic:
---8<---
> *panic_thread::print kthread_t t_procp | ::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  20640  20566  19393  19393      0 0x42004000 fffffea11810d3b8 fuser
---8<---


and since the access was being done via linux /proc, what's that process
that fuser was trying to look at:
---8<---
> $c3 ! grep lxpr_getnode
lxpr_getnode+0x2bd(fffffe80dd7c52c0, 4, fffffe8b7b8313b8)
> fffffe8b7b8313b8::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
Z   2450      1   2450   2450      2 0x52000d02 fffffe8b7b8313b8 atd
---8<---


ah.  a zombie process.  that means that there isn't going to be any threads
or a working directory associated with this process.
so if we look at the behavior of /proc in a linux zone and on a native
linux machine we notice some differences.  for reference, here's how proc
files for a zombie process on a linux machine behave:
---8<---
edp@lucea$ ls /proc/20418
cmdline  cwd@     exe@  maps  mounts  stat   status
cpu      environ  fd/   mem   root@   statm
edp@lucea$ ls /proc/20418/fd
ls: /proc/20418/fd: Permission denied
edp@lucea$ ls -l /proc/20418
ls: cannot read symbolic link /proc/20418/cwd: Permission denied
ls: cannot read symbolic link /proc/20418/root: Permission denied
ls: cannot read symbolic link /proc/20418/exe: Permission denied
total 0
-r--r--r--    1 root     root            0 Oct 27 20:43 cmdline
-r--r--r--    1 root     root            0 Oct 27 20:43 cpu
lrwxrwxrwx    1 root     root            0 Oct 27 20:43 cwd
-r--------    1 root     root            0 Oct 27 20:43 environ
lrwxrwxrwx    1 root     root            0 Oct 27 20:43 exe
dr-x------    2 root     root            0 Oct 27 20:43 fd/
-r--------    1 root     root            0 Oct 27 20:43 maps
-rw-------    1 root     root            0 Oct 27 20:43 mem
-r--r--r--    1 root     root            0 Oct 27 20:43 mounts
lrwxrwxrwx    1 root     root            0 Oct 27 20:43 root
-r--r--r--    1 root     root            0 Oct 27 20:43 stat
-r--r--r--    1 root     root            0 Oct 27 20:43 statm
-r--r--r--    1 root     root            0 Oct 27 20:43 status
edp@lucea$ cat /proc/20418/cmdline
edp@lucea$ cat /proc/20418/cpu
cpu  0 0
cpu0 0 0
cpu1 0 0
edp@lucea$ cat /proc/20418/cwd
cat: /proc/20418/cwd: Permission denied
edp@lucea$ cat /proc/20418/environ
cat: /proc/20418/environ: Permission denied
edp@lucea$ cat /proc/20418/maps
cat: /proc/20418/maps: Permission denied
edp@lucea$ cat /proc/20418/mem
cat: /proc/20418/mem: Permission denied
edp@lucea$ cat /proc/20418/mounts
cat: /proc/20418/mounts: Invalid argument
edp@lucea$ cat /proc/20418/root
cat: /proc/20418/root: Permission denied
edp@lucea$ cat /proc/20418/stat
20418 (zombie.Linux.i3) Z 20411 20411 20196 34817 20411 4194372 8 0 2 0 0 0 0 0 20 0 0 0 252881831 0 0 4294967295 0 0 0 0 0 0 0 0 0 3222461040 0 0 17 1 0 0 0 0 0 0
edp@lucea$ cat /proc/20418/statm
0 0 0 0 0 0 0
edp@lucea$ cat /proc/20418/status
Name:   zombie.Linux.i3
State:  Z (zombie)
Tgid:   20418
Pid:    20418
PPid:   20411
TracerPid:      0
Uid:    89769   89769   89769   89769
Gid:    10      10      10      10
FDSize: 0
Groups: 10
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
---8<---

the way that lx_proc behaves is differentin a few cases.  all these cases
won't be fixed with this bug.  the fix for this bug will address the panic
seen when accessing "cwd" and it will fix the error returned when accessing
the following symlinks: "cwd", "root", "exe".
Work Around
N/A
Comments
N/A