|
Description
|
When executing ::findleaks on 2 IBM eSeries machines running snv_35 in ON-PIT, we
got a few leaks but we got no info about the CALLER or stacktrace. This can also
be reproduced with previous Nevada builds and on s10 updates. Other x86 machines
do not seem to show this problem.
This is quite reproducible on these type of machines, I first thought it could
be a driver that was unloaded, so I set moddebug to 0x00020000 but did not get
any more info.
> kmem_flags/X
kmem_flags:
kmem_flags: f
> moddebug/X
moddebug:
moddebug: 20000
> ::findleaks -dv
findleaks: maximum buffers => 88848
findleaks: actual buffers => 86976
findleaks:
findleaks: potential pointers => 8953065
findleaks: dismissals => 8312974 (92.8%)
findleaks: misses => 127319 ( 1.4%)
findleaks: dups => 425799 ( 4.7%)
findleaks: follows => 86973 ( 0.9%)
findleaks:
findleaks: peak memory usage => 1097 kB
findleaks: elapsed CPU time => 0.2 seconds
findleaks: elapsed wall time => 0.3 seconds
findleaks:
CACHE LEAKED BUFCTL CALLER
ca832030 1 c4dcf7b0 ?
ca833030 1 c75ef1e8 ?
ca8336f0 1 c766e858 ?
------------------------------------------------------------------------
Total 3 buffers, 464 bytes
kmem_alloc_112 leak: 1 buffer, 112 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c4dcf7b0 c4dad200 256556539a c6e85ee0
ca832030 0 cbf1c2a8
kmem_alloc_160 leak: 1 buffer, 160 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c75ef1e8 c76558f8 25655685fd c6e85ee0
ca833030 0 cc2eb7c0
kmem_alloc_192 leak: 1 buffer, 192 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c766e858 c7689900 256556640c c6e85ee0
ca8336f0 0 cb761e08
Looking at *.2.*, we see:
> ::findleaks -d
CACHE LEAKED BUFCTL CALLER
ca832030 1 c56eb3e0 ?
ca8336f0 1 c7be8400 ?
ca833030 1 c7c32f68 ?
cccbc6f0 1 c7c9f950 ?
------------------------------------------------------------------------
Total 4 buffers, 1732 bytes
kmem_alloc_112 leak: 1 buffer, 112 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c56eb3e0 c56a1400 25381f85ec c73c8980
ca832030 0 cbc54040
kmem_alloc_192 leak: 1 buffer, 192 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c7be8400 c7c15f00 25381f91fb c73c8980
ca8336f0 0 cbf5f248
kmem_alloc_160 leak: 1 buffer, 160 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS c7c32f68 c7c72590 25381fb27e c73c8980
ca833030 0 cbf65f40
rootnex_dmahdl leak: 1 buffer, 1268 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c7c9f950 c7c76a80 25381fa8ce c73c8980
cccbc6f0 0 cbf6b508
Lets take the first buffer:
> c7c15f00::whatis
c7c15f00 is c7c15f00+0, bufctl c7be8400 allocated from kmem_alloc_192
> c7be8400::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
c7be8400 c7c15f00 25381f91fb c73c8980
ca8336f0 0 cbf5f248
> c7be8400::print kmem_bufctl_audit_t
{
bc_next = 0xc73fc428
bc_addr = 0xc7c15f00
bc_slab = 0xc7be9f68
bc_cache = 0xca8336f0
bc_timestamp = 0x25381f91fb
bc_thread = 0xc73c8980
bc_lastlog = 0
bc_contents = 0xcbf5f248
bc_depth = 0
bc_stack = [ 0xfe92a02b, 0xfe92a861, 0xfe92acb7, 0xfeb6141d, 0xfeb62d7d,
0xfeaff95b, 0xfeaffb2b, 0xfeb94bb1, 0xfeb96b85, 0xfeb98d28, 0xfeb62b09,
0xfeb6379b, 0xfeb93642, 0xfeb92760, 0xfebaa250 ]
}
>
So ::findleaks is just reporting what the data says; bc_depth is 0, which indicates
that no stack frames are valid. The stack trace in the bufctl is:
> c7be8400::print -at -l 1 kmem_bufctl_audit_t bc_stack
c7be8428 pc_t [15] bc_stack = [ 0xfe92a02b, ... ]
> c7be8428::array pc_t 0t15 | /pn
0xc7be8428: kmem_cache_free_debug+0xf6
kmem_cache_free+0x2f
kmem_free+0x116
adpu320_pktfree+0x3d
adpu320_tran_teardown_pkt+0x1f
scsi_free_cache_pkt+0x19
scsi_destroy_pkt+0x20
sd_destroypkt_for_buf+0x3b
sd_return_command+0x199
sdintr+0x5a3
adpu320_drain_completed_queue+0x80
adpu320_intr+0x53
sd_mapblockaddr_iostart+0x197
sd_xbuf_strategy+0x4e
xbuf_iostart+0x118
But that matches up with a previous allocation:
> ::walk kmem_log | ::bufctl -va c7c15f00
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
cb59b1b4 c7c15f00 24ddbfe1f4 c4635de0
ca8336f0 ca866c28 cbf5f248
kmem_cache_free_debug+0xf6
kmem_cache_free+0x2f
kmem_free+0x116
adpu320_pktfree+0x3d
adpu320_tran_teardown_pkt+0x1f
scsi_free_cache_pkt+0x19
scsi_destroy_pkt+0x20
sd_destroypkt_for_buf+0x3b
sd_return_command+0x199
sdintr+0x5a3
adpu320_drain_completed_queue+0x80
adpu320_intr+0x53
ca866c28 c7c15f00 24d20cb46c c73c8980
ca8336f0 0 0
kmem_cache_alloc_debug+0x256
kmem_cache_alloc+0x1ac
kmem_zalloc+0x4b
adpu320_allocIOB+0x18
adpu320_pktalloc+0x47
adpu320_tran_setup_pkt+0x1a
scsi_init_cache_pkt+0x158
scsi_init_pkt+0x73
sd_setup_rw_pkt+0xf7
sd_initpkt_for_buf+0xdc
sd_start_cmds+0x237
sd_core_iostart+0x1b0
sd_mapblockaddr_iostart+0x197
sd_xbuf_strategy+0x4e
xbuf_iostart+0x118
Looking at the log, it looks like the last entry was:
> ::kmem_log
CPU ADDR BUFADDR TIMESTAMP THREAD
0 cae3e7a4 c72fe820 24e84a3c99 c73c8980
0 cae3e740 c72fe820 24e838aaf7 c73c8980
0 cae3e6dc c7546040 24e838a194 c73c8980
0 cae3e678 c7546040 24e82746fe c73c8980
...
But there were allocations afterwards; as late as 25381f91fb. What's going
on? Well, panic_hrtime was:
> panic_hrtime::print
0x24e84ad7dd
All of the leaked buffers were allocated *after* panic time. (this
also explains the NULL lastlog entries; after panicking, we stop putting
things in the log)
The reason we don't get stacktraces is that getpcstack() doesn't handle the thread being on the panic_stack, and assumes the frame pointer is invalid, resulting in
an empty stack; ::findleaks is just reporting what's recorded in the dump.
Buffers allocated after panic are always a tad odd; they can end up being inconsistant, especially if they are allocated and freed during the dump.
The correct fix is to change getpcstack() to recognize that the thread is a
panicking, and have it recognize the panic stack, similar to the intr_stack
handling. The tricky bit is getting the transition back to the pre-panic()
stack correct.
The good news is that testing the fix is fairly easy, and doesn't require
post-panic leaks. Just generate a crash dump (reboot -d), pull open the
crash dump in mdb, and do:
> ::walk bufctl ! sort -k3r | less
the first one with a sane-looking timestamp should also have a filled-in
"caller" field; pass his "ADDR" field to '::bufctl -v', and verify that
the stack trace has panic() in it.
|