OpenSolaris

Printable Version Enter a New Search
Bug ID 6336202
Synopsis d4fc7824::typegraph made mdb crash
State 10-Fix Delivered (Fix available in build)
Category:Subcategory utility:mdb
Keywords onnv_triage
Responsible Engineer Jonathan Adams
Reported Against
Duplicate Of
Introduced In solaris_8
Commit to Fix snv_36
Fixed In snv_36
Release Fixed solaris_nevada(snv_36)
Related Bugs
Submit Date 12-October-2005
Last Update Date 21-February-2007
Description
While analyzing a crash dump, I ran ::typegraph and then I meant to
run ::whattype but accidentally ran d4fc7824::typegraph instead.
mdb said

> d4fc7824::typegraph
mdb: cache '' has invalid magtype pointer (0)

*** mdb: received signal FPE at:
    [1] genunix.so`kmem_read_magazines+0x68b()
    [2] genunix.so`kmem_walk_init+0x64()
    [3] mdb`mdb_get_dot+0xba()
    [4] mdb`mdb_pwalk+0x2f()
    [5] genunix.so`typegraph+0x316()
    [6] mdb`mdb_dcmd_usage+0x106()
    [7] mdb`mdb_call_idcmd+0x117()
    [8] mdb`mdb_call+0x2be()
    [9] mdb`yyparse+0xc80()
    [10] mdb`mdb_run+0x245()
    [11] mdb`main+0xea1()
    [12] mdb`_start+0x7a()

mdb: (c)ore dump, (q)uit, (r)ecover, or (s)top for debugger [cqrs]?

The core is attached.  I verified that the same thing happens on
snv_23.

---

addr::typegraph is a "summary" mode, where typegraph walks a kmem cache, and
gives a summary of how well-covered it is.  To do this, it passes on the
address to the "kmem" walker.

The kmem walker reads in a kmem_cache_t at the pointer address.  If the address is
valid, it drives on, and then ends up dividing by zero in kmem_walk_init_common()
(the address shows up in kmem_read_magazines() because the function is static, and
therefore not in the dynsym, which is what mdb(1) uses to report stack traces)

The code that dies is:

807                         chunksize = cp->cache_chunksize;
808                         slabsize = cp->cache_slabsize;
810                         kmw->kmw_ubase = mdb_alloc(slabsize +
811                             sizeof (kmem_bufctl_t), UM_SLEEP);
812 
813                         if (type & KM_ALLOCATED)
814                                 kmw->kmw_valid =
815                                     mdb_alloc(slabsize / chunksize, UM_SLEEP);
                                                           ^

if chunksize is zero, this will die.

This can be reproduced more quickly by passing the invalid address to ::walk kmem.

The fix is simple;  validate the value before dividing.
It's better to validate that the cache looks sane;  here's the core of
the suggested fix:

+       /*
+        * It's easy for someone to hand us an invalid cache address.
+        * Unfortunately, it is hard for this walker to survive an
+        * invalid cache cleanly.  So we make sure that:
+        *
+        *      1. the vmem arena for the cache is readable,
+        *      2. the vmem arena's quantum is a power of 2,
+        *      3. our slabsize is a multiple of the quantum, and
+        *      4. our chunksize is >0 and less than our slabsize.
+        */
+       if (mdb_vread(&vm_quantum, sizeof (vm_quantum),
+           (uintptr_t)&cp->cache_arena->vm_quantum) == -1 ||
+           vm_quantum == 0 ||
+           (vm_quantum & (vm_quantum - 1)) != 0 ||
+           cp->cache_slabsize < vm_quantum ||
+           P2PHASE(cp->cache_slabsize, vm_quantum) != 0 ||
+           cp->cache_chunksize == 0 ||
+           cp->cache_chunksize > cp->cache_slabsize) {
+               mdb_warn("%p is not a valid kmem_cache_t\n", addr);
+               goto out2;
+       }
Work Around
N/A
Comments
N/A