OpenSolaris

Printable Version Enter a New Search
Bug ID 6598604
Synopsis BAD TRAP while set property to multiple filesystems
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:zfs
Keywords
Responsible Engineer Matthew Ahrens
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_85
Fixed In snv_85
Release Fixed solaris_nevada(snv_85) , solaris_10u6(s10u6_01) (Bug ID:2160887)
Related Bugs
Submit Date 29-August-2007
Last Update Date 29-April-2008
Description
While testing Eric's libzfs/history, I hit this panic while the command over the maximize size. Looks like stack overflow. Finally be verified it's a generic issue, and only happens on x86, so far.

Save the crashdump to
/net/zion.eng/export/dumps/robin/<bug id>/*.2

panic[cpu1]/thread=c8287de0: 
BAD TRAP: type=8 (#df Double fault) rp=fec24678 addr=0


sched: 
#df Double fault
pid=0, pc=0xf8611e70, sp=0xc8286000, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: c8285ffc cr3: 22e9000
         gs:      1b0  fs:        0  es:      160  ds:      160
        edi:        0 esi:        0 ebp:        0 esp: fec246b0
        ebx:       6c edx:        0 ecx:        0 eax: c8286030
        trp:        8 err:        0 eip: f8611e70  cs:      158
        efl:    10282 usp: c8286000  ss:      160
tss.tss_link:   0x0                   
tss.tss_esp0:   0xc8287e34
tss.tss_ss0:    0x160
tss.tss_esp1:   0xc7b8c000
tss.tss_ss1:    0x160
tss.tss_esp2:   0xc7b8c000
tss.tss_ss2:    0x160
tss.tss_cr3:    0x0
tss.tss_eip:    0xf8611e70
tss.tss_eflags: 0x10282
tss.tss_eax:    0xc8286030
tss.tss_ebx:    0x6c
tss.tss_ecx:    0xe
tss.tss_edx:    0x0
tss.tss_esp:    0xc8286000

> c8287de0::threadlist -v
    ADDR     PROC      LWP CLS PRI    WCHAN
c8287de0 fec1eb50        0   0  60        0
  PC: panicsys+0x4a    THREAD: txg_sync_thread()
  stack pointer for thread c8287de0: c8286034
    dnode_hold_impl+0xa3()
    dnode_hold+0x1c()
    dmu_bonus_hold+0x25()
    dsl_dir_open_obj+0x55()
    dsl_prop_changed_notify+0x45()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_changed_notify+0x149()
    dsl_prop_set_sync+0xf3()
    dsl_sync_task_group_sync+0xe8()
    dsl_pool_sync+0xf9()
    spa_sync+0x212()
    txg_sync_thread+0x25a()
    thread_start+8()


panic[cpu1]/thread=c8287de0: 
BAD TRAP: type=e (#pf Page fault) rp=fec422cc addr=0 occurred in module "<unknow
n>" due to a NULL pointer dereference


#!/bin/ksh -p
a="123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123"

b="123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/124"

mkfile 100m /tmp/file
pool=t
zpool create $pool  /tmp/file
zfs create -p $pool/$a
zfs rename $pool/$a $pool/$b

fs=$pool/$b
fss=$pool/$b
while [[ $fs != $pool ]]; do
        fs=${fs%/*}
        fss="$fss $fs"
done
zfs set readonly=off $fss
zpool history $pool
zpool destroy $pool
Yeah each dsl_prop_changed_notify() call is 0t400 bytes on the stack (largely due to
zap_attribute_t).  This is also on a 32bit machine.

Funny enough, within dsl_prop_changed_notify(), there's this comment at the bottom:
"
for (zap_cursor_init(&zc, mos,
            dd->dd_phys->dd_child_dir_zapobj);
            zap_cursor_retrieve(&zc, &za) == 0;
            zap_cursor_advance(&zc)) {
                /* XXX recursion could blow stack; esp. za! */
                dsl_prop_changed_notify(dp, za.za_first_integer,
                    propname, value, FALSE);
        }
"

A zap_attribute_t is defined as:
typedef struct {
        int za_integer_length;
        uint64_t za_num_integers;
        uint64_t za_first_integer;      /* no sign extension for <8byte ints */
        char za_name[MAXNAMELEN];
} zap_attribute_t;

Out of the 400 bytes, 256 are just in za_name.  We should alloc the zap_attribute_t.

We could also see if perhaps dsl_prop_changed_notify() doesn't need to be
recursive.
Work Around
N/A
Comments
N/A