|
Description
|
While testing Eric's libzfs/history, I hit this panic while the command over the maximize size. Looks like stack overflow. Finally be verified it's a generic issue, and only happens on x86, so far.
Save the crashdump to
/net/zion.eng/export/dumps/robin/<bug id>/*.2
panic[cpu1]/thread=c8287de0:
BAD TRAP: type=8 (#df Double fault) rp=fec24678 addr=0
sched:
#df Double fault
pid=0, pc=0xf8611e70, sp=0xc8286000, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: c8285ffc cr3: 22e9000
gs: 1b0 fs: 0 es: 160 ds: 160
edi: 0 esi: 0 ebp: 0 esp: fec246b0
ebx: 6c edx: 0 ecx: 0 eax: c8286030
trp: 8 err: 0 eip: f8611e70 cs: 158
efl: 10282 usp: c8286000 ss: 160
tss.tss_link: 0x0
tss.tss_esp0: 0xc8287e34
tss.tss_ss0: 0x160
tss.tss_esp1: 0xc7b8c000
tss.tss_ss1: 0x160
tss.tss_esp2: 0xc7b8c000
tss.tss_ss2: 0x160
tss.tss_cr3: 0x0
tss.tss_eip: 0xf8611e70
tss.tss_eflags: 0x10282
tss.tss_eax: 0xc8286030
tss.tss_ebx: 0x6c
tss.tss_ecx: 0xe
tss.tss_edx: 0x0
tss.tss_esp: 0xc8286000
> c8287de0::threadlist -v
ADDR PROC LWP CLS PRI WCHAN
c8287de0 fec1eb50 0 0 60 0
PC: panicsys+0x4a THREAD: txg_sync_thread()
stack pointer for thread c8287de0: c8286034
dnode_hold_impl+0xa3()
dnode_hold+0x1c()
dmu_bonus_hold+0x25()
dsl_dir_open_obj+0x55()
dsl_prop_changed_notify+0x45()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_changed_notify+0x149()
dsl_prop_set_sync+0xf3()
dsl_sync_task_group_sync+0xe8()
dsl_pool_sync+0xf9()
spa_sync+0x212()
txg_sync_thread+0x25a()
thread_start+8()
panic[cpu1]/thread=c8287de0:
BAD TRAP: type=e (#pf Page fault) rp=fec422cc addr=0 occurred in module "<unknow
n>" due to a NULL pointer dereference
#!/bin/ksh -p
a="123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123"
b="123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/124"
mkfile 100m /tmp/file
pool=t
zpool create $pool /tmp/file
zfs create -p $pool/$a
zfs rename $pool/$a $pool/$b
fs=$pool/$b
fss=$pool/$b
while [[ $fs != $pool ]]; do
fs=${fs%/*}
fss="$fss $fs"
done
zfs set readonly=off $fss
zpool history $pool
zpool destroy $pool
Yeah each dsl_prop_changed_notify() call is 0t400 bytes on the stack (largely due to
zap_attribute_t). This is also on a 32bit machine.
Funny enough, within dsl_prop_changed_notify(), there's this comment at the bottom:
"
for (zap_cursor_init(&zc, mos,
dd->dd_phys->dd_child_dir_zapobj);
zap_cursor_retrieve(&zc, &za) == 0;
zap_cursor_advance(&zc)) {
/* XXX recursion could blow stack; esp. za! */
dsl_prop_changed_notify(dp, za.za_first_integer,
propname, value, FALSE);
}
"
A zap_attribute_t is defined as:
typedef struct {
int za_integer_length;
uint64_t za_num_integers;
uint64_t za_first_integer; /* no sign extension for <8byte ints */
char za_name[MAXNAMELEN];
} zap_attribute_t;
Out of the 400 bytes, 256 are just in za_name. We should alloc the zap_attribute_t.
We could also see if perhaps dsl_prop_changed_notify() doesn't need to be
recursive.
|