OpenSolaris

Printable Version Enter a New Search
Bug ID 6394382
Synopsis getpcstack() doesn't work post-panic on x86
State 1-Dispatched (Default State)
Category:Subcategory kernel:arch-x86
Keywords onnv_triage | oss-bite-size
Sponsor
Submitter
Reported Against snv_35
Duplicate Of
Introduced In
Commit to Fix
Fixed In
Release Fixed
Related Bugs
Submit Date 6-March-2006
Last Update Date 7-April-2009
Description
When executing ::findleaks on 2 IBM eSeries machines running snv_35 in ON-PIT, we
got a few leaks but we got no info about the CALLER or stacktrace. This can also
be reproduced with previous Nevada builds and on s10 updates. Other x86 machines
do not seem to show this problem.

This is quite reproducible on these type of machines, I first thought it could
be a driver that was unloaded, so I set moddebug to 0x00020000 but did not get
any more info.

  > kmem_flags/X
  kmem_flags:
  kmem_flags:     f

  > moddebug/X
  moddebug:
  moddebug:       20000

  > ::findleaks -dv
  findleaks:                maximum buffers => 88848
  findleaks:                 actual buffers => 86976
  findleaks:
  findleaks:             potential pointers => 8953065
  findleaks:                     dismissals => 8312974       (92.8%)
  findleaks:                         misses => 127319        ( 1.4%)
  findleaks:                           dups => 425799        ( 4.7%)
  findleaks:                        follows => 86973         ( 0.9%)
  findleaks:
  findleaks:              peak memory usage => 1097 kB
  findleaks:               elapsed CPU time => 0.2 seconds
  findleaks:              elapsed wall time => 0.3 seconds
  findleaks:
  CACHE     LEAKED   BUFCTL CALLER
  ca832030       1 c4dcf7b0 ?
  ca833030       1 c75ef1e8 ?
  ca8336f0       1 c766e858 ?
  ------------------------------------------------------------------------
     Total       3 buffers, 464 bytes

  kmem_alloc_112 leak: 1 buffer, 112 bytes
              ADDR          BUFADDR        TIMESTAMP           THREAD
                              CACHE          LASTLOG         CONTENTS
          c4dcf7b0         c4dad200       256556539a         c6e85ee0
                           ca832030                0         cbf1c2a8

  kmem_alloc_160 leak: 1 buffer, 160 bytes
              ADDR          BUFADDR        TIMESTAMP           THREAD
                              CACHE          LASTLOG         CONTENTS
          c75ef1e8         c76558f8       25655685fd         c6e85ee0
                           ca833030                0         cc2eb7c0

  kmem_alloc_192 leak: 1 buffer, 192 bytes
              ADDR          BUFADDR        TIMESTAMP           THREAD
                              CACHE          LASTLOG         CONTENTS
          c766e858         c7689900       256556640c         c6e85ee0
                           ca8336f0                0         cb761e08
Looking at *.2.*, we see:
> ::findleaks -d
CACHE     LEAKED   BUFCTL CALLER
ca832030       1 c56eb3e0 ?
ca8336f0       1 c7be8400 ?
ca833030       1 c7c32f68 ?
cccbc6f0       1 c7c9f950 ?
------------------------------------------------------------------------
   Total       4 buffers, 1732 bytes

kmem_alloc_112 leak: 1 buffer, 112 bytes
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
        c56eb3e0         c56a1400       25381f85ec         c73c8980
                         ca832030                0         cbc54040

kmem_alloc_192 leak: 1 buffer, 192 bytes
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
        c7be8400         c7c15f00       25381f91fb         c73c8980
                         ca8336f0                0         cbf5f248

kmem_alloc_160 leak: 1 buffer, 160 bytes
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS                           c7c32f68         c7c72590       25381fb27e         c73c8980
                         ca833030                0         cbf65f40

rootnex_dmahdl leak: 1 buffer, 1268 bytes
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
        c7c9f950         c7c76a80       25381fa8ce         c73c8980
                         cccbc6f0                0         cbf6b508

Lets take the first buffer:

> c7c15f00::whatis
c7c15f00 is c7c15f00+0, bufctl c7be8400 allocated from kmem_alloc_192
> c7be8400::bufctl -v
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
        c7be8400         c7c15f00       25381f91fb         c73c8980
                         ca8336f0                0         cbf5f248

> c7be8400::print kmem_bufctl_audit_t
{
    bc_next = 0xc73fc428
    bc_addr = 0xc7c15f00
    bc_slab = 0xc7be9f68
    bc_cache = 0xca8336f0
    bc_timestamp = 0x25381f91fb
    bc_thread = 0xc73c8980
    bc_lastlog = 0
    bc_contents = 0xcbf5f248
    bc_depth = 0
    bc_stack = [ 0xfe92a02b, 0xfe92a861, 0xfe92acb7, 0xfeb6141d, 0xfeb62d7d, 
0xfeaff95b, 0xfeaffb2b, 0xfeb94bb1, 0xfeb96b85, 0xfeb98d28, 0xfeb62b09, 
0xfeb6379b, 0xfeb93642, 0xfeb92760, 0xfebaa250 ]
}
> 

So ::findleaks is just reporting what the data says;  bc_depth is 0, which indicates
that no stack frames are valid.  The stack trace in the bufctl is:

> c7be8400::print -at -l 1 kmem_bufctl_audit_t bc_stack
c7be8428 pc_t [15] bc_stack = [ 0xfe92a02b, ... ]
> c7be8428::array pc_t 0t15 | /pn
0xc7be8428:     kmem_cache_free_debug+0xf6
                kmem_cache_free+0x2f
                kmem_free+0x116 
                adpu320_pktfree+0x3d
                adpu320_tran_teardown_pkt+0x1f
                scsi_free_cache_pkt+0x19
                scsi_destroy_pkt+0x20
                sd_destroypkt_for_buf+0x3b
                sd_return_command+0x199
                sdintr+0x5a3    
                adpu320_drain_completed_queue+0x80
                adpu320_intr+0x53
                sd_mapblockaddr_iostart+0x197
                sd_xbuf_strategy+0x4e
                xbuf_iostart+0x118

But that matches up with a previous allocation:

> ::walk kmem_log | ::bufctl -va c7c15f00         
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
        cb59b1b4         c7c15f00       24ddbfe1f4         c4635de0
                         ca8336f0         ca866c28         cbf5f248
                 kmem_cache_free_debug+0xf6
                 kmem_cache_free+0x2f
                 kmem_free+0x116
                 adpu320_pktfree+0x3d
                 adpu320_tran_teardown_pkt+0x1f
                 scsi_free_cache_pkt+0x19
                 scsi_destroy_pkt+0x20
                 sd_destroypkt_for_buf+0x3b
                 sd_return_command+0x199
                 sdintr+0x5a3
                 adpu320_drain_completed_queue+0x80
                 adpu320_intr+0x53

        ca866c28         c7c15f00       24d20cb46c         c73c8980
                         ca8336f0                0                0
                 kmem_cache_alloc_debug+0x256
                 kmem_cache_alloc+0x1ac
                 kmem_zalloc+0x4b
                 adpu320_allocIOB+0x18
                 adpu320_pktalloc+0x47
                 adpu320_tran_setup_pkt+0x1a
                 scsi_init_cache_pkt+0x158
                 scsi_init_pkt+0x73
                 sd_setup_rw_pkt+0xf7
                 sd_initpkt_for_buf+0xdc
                 sd_start_cmds+0x237
                 sd_core_iostart+0x1b0
                 sd_mapblockaddr_iostart+0x197
                 sd_xbuf_strategy+0x4e
                 xbuf_iostart+0x118


Looking at the log, it looks like the last entry was:

> ::kmem_log
CPU ADDR     BUFADDR         TIMESTAMP THREAD  
  0 cae3e7a4 c72fe820       24e84a3c99 c73c8980
  0 cae3e740 c72fe820       24e838aaf7 c73c8980
  0 cae3e6dc c7546040       24e838a194 c73c8980
  0 cae3e678 c7546040       24e82746fe c73c8980
...

But there were allocations afterwards;  as late as 25381f91fb.  What's going
on?  Well, panic_hrtime was:

> panic_hrtime::print
0x24e84ad7dd

All of the leaked buffers were allocated *after* panic time.  (this 
also explains the NULL lastlog entries;  after panicking, we stop putting 
things in the log)

The reason we don't get stacktraces is that getpcstack() doesn't handle the thread being on the panic_stack, and assumes the frame pointer is invalid, resulting in
an empty stack;  ::findleaks is just reporting what's recorded in the dump.

Buffers allocated after panic are always a tad odd;  they can end up being inconsistant, especially if they are allocated and freed during the dump.

The correct fix is to change getpcstack() to recognize that the thread is a 
panicking, and have it recognize the panic stack, similar to the intr_stack
handling.  The tricky bit is getting the transition back to the pre-panic()
stack correct.

The good news is that testing the fix is fairly easy, and doesn't require
post-panic leaks.  Just generate a crash dump (reboot -d), pull open the
crash dump in mdb, and do:

   > ::walk bufctl ! sort -k3r | less

the first one with a sane-looking timestamp should also have a filled-in 
"caller" field;  pass his "ADDR" field to '::bufctl -v', and verify that
the stack trace has panic() in it.
Work Around
N/A
Comments
N/A