|
Description
|
We saw a panic with the following stack:
> $C
ffffff000ffeebd0 nvlist_free+0x25(ffffff82b39114f0)
ffffff000ffeec20 zfs_ioc_pool_get_props+0x89(ffffffaaa969d000)
ffffff000ffeeca0 zfsdev_ioctl+0x12e(b600000000, 5a29, fd9cddd0, 100003, ffffff02ee344aa0, ffffff000ffeee8c)
ffffff000ffeece0 cdev_ioctl+0x48(b600000000, 5a29, fd9cddd0, 100003, ffffff02ee344aa0, ffffff000ffeee8c)
ffffff000ffeed20 spec_ioctl+0x86(ffffff02eacfea80, 5a29, fd9cddd0, 100003, ffffff02ee344aa0, ffffff000ffeee8c, 0)
ffffff000ffeeda0 fop_ioctl+0x7b(ffffff02eacfea80, 5a29, fd9cddd0, 100003, ffffff02ee344aa0, ffffff000ffeee8c, 0)
ffffff000ffeeeb0 ioctl+0x174(17, 5a29, fd9cddd0)
ffffff000ffeef00 sys_syscall32+0x1fc(()
> ffffff82b39114f0::nvlist
mdb: failed to read nvpriv at deadbeefdeadbeef: no mapping for address
The problem comes from the fact that zfs_ioc_pool_get_props()
has the following code:
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
return (error);
-------> error = spa_prop_get(spa, &nvp);
if (error == 0 && zc->zc_nvlist_dst != NULL)
error = put_nvlist(zc, nvp);
else
error = EFAULT;
spa_close(spa, FTAG);
if (nvp)
-------> nvlist_free(nvp);
However, spa_prop_get() has bad error semantics. In particular, it
has the following bit of code at the end:
out:
if (err && err != ENOENT) {
nvlist_free(*nvp);
return (err);
}
So we free the nvlist passed in by the user and return on error,
but then try to free it again in the caller. spa_prop_get() is
in error here, as no function should set a parameter's value
unless it intends to return success.
How to reproduce:
-----------------
1. Create five NFS shares on Audrey (proto toro connected with Riverwalk)
2. Mount these filesystems on one of the data host
3. Run vdbench... I used the follwoing option file:
[root@sam-v880a0 vdbench]$ more sseqa.dsk
sd=sd1,lun=/s1/file_1g,threads=8,size=1g
sd=sd2,lun=/s2/file_2g,threads=12,size=2g
sd=sd3,lun=/s3/file_3g,threads=16,size=3g
sd=sd4,lun=/s4/file_4g,threads=16,size=4g
sd=sd5,lun=/s5/file_5g,threads=16,size=5g
#sd=sd2,lun=/dev/rdsk/c6t010000144F8D43A600002A0047829BCDd0s6
wd=wd1,sd=(sd1-sd5),xfersize=4k,readpct=70
#wd=wd1,sd=sd1,xfersize=4k,readpct=70
rd=run1,wd=wd1,iorate=max,elapsed=999999999,interval=60
4. Keep it running for at least 12 hrs. In my case it ran well for 6 hrs, and start giving error after that time.
|