|
Description
|
Category
kernel
Sub-Category
driver-sd-fixed
Description
I tried to create a zfs filesystem on the dvd-ram drive, using:
zpool create -f dvdram c1t1d0
This paniced the machine.
panic[cpu0]/thread=dc99ede0:
BAD TRAP: type=e (#pf Page fault) rp=dc99e6ec addr=150 occurred in module "unix" due to a NULL pointer dereference
sched:
#pf Page fault
Bad kernel fault at addr=0x150
pid=0, pc=0xfe82b18d, sp=0xfe8e1447, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 150 cr3: 6699000
gs: fea401b0 fs: d0460000 es: d6e10160 ds: 10160
edi: 150 esi: 0 ebp: dc99e764 esp: dc99e724
ebx: 0 edx: dc99ede0 ecx: 150 eax: 0
trp: e err: 2 eip: fe82b18d cs: 158
efl: 10246 usp: fe8e1447 ss: 150
dc99e63c unix:die+98 (e, dc99e6ec, 150, 0)
dc99e6d8 unix:trap+11b9 (dc99e6ec, 150, 0)
dc99e6ec unix:_cmntrap+9f (fea401b0, d0460000,)
dc99e764 unix:mutex_enter+d (0, 1)
dc99e79c sd:sd_range_lock+5e (d3f28000, 4800, 480)
dc99e7e0 sd:sd_mapblocksize_iostart+106 (6, d3f28000, e8506b)
dc99e820 sd:sd_mapblockaddr_iostart+125 (5, d3f28000, e8506b)
dc99e840 sd:sd_xbuf_strategy+30 (e8506b58, d3b458d0,)
dc99e870 sd:xbuf_iostart+de (d17893a0)
dc99e888 sd:ddi_xbuf_qstrategy+4b (e8506b58, d17893a0)
dc99e8ac sd:sdstrategy+d3 (e8506b58)
dc99e8c0 genunix:bdev_strategy+4d (e8506b58)
dc99e8d4 genunix:ldi_strategy+40 (eac6ab80, e8506b58)
dc99e904 zfs:vdev_disk_io_start+1c7 (d0a42000, 0, dc99e9)
dc99e914 zfs:vdev_io_start+18 (d0a42000, 0, dc99e9)
dc99e92c zfs:zio_vdev_io_start+e (d0a42000)
dc99e940 zfs:zio_next_stage+73 (d0a42000)
dc99e95c zfs:zio_vdev_io_setup+75 (d0a42000)
dc99e974 zfs:zio_next_stage+73 (d0a42000)
dc99e99c zfs:zio_dva_translate+86 (d0a42000)
dc99e9b8 zfs:zio_next_stage+73 (d0a42000)
dc99e9dc zfs:zio_ready+3a (d0a42000)
dc99e9f4 zfs:zio_next_stage+73 (d0a42000)
dc99ea1c zfs:zio_dva_allocate+a6 (d0a42000)
dc99ea38 zfs:zio_next_stage+73 (d0a42000)
dc99ea5c zfs:zio_checksum_generate+5e (d0a42000)
dc99ea74 zfs:zio_next_stage+73 (d0a42000)
dc99eacc zfs:zio_write_compress+236 (d0a42000)
dc99eaec zfs:zio_next_stage+73 (d0a42000)
dc99eb0c zfs:zio_wait_for_children+58 (d0a42000, 1, d0a421)
dc99eb20 zfs:zio_wait_children_ready+18 (d0a42000)
dc99eb34 zfs:zio_next_stage_async+b9 (d0a42000, 200, 0, f)
dc99eb4c zfs:zio_nowait+e (d0a42000)
dc99eb60 zfs:arc_write+6d (ec49e600, f106b380,)
dc99ebec zfs:dbuf_sync+555 (e4962c18, ec49e600,)
dc99ec4c zfs:dnode_sync+350 (e841ab10, 0, ec49e6)
dc99ec80 zfs:dmu_objset_sync_dnodes+7e (e5818980, e5818a4c,)
dc99ecb8 zfs:dmu_objset_sync+5d (e5818980, d8808eb8)
dc99ecd0 zfs:dsl_dataset_sync+17 (e1e08500, d8808eb8)
dc99ed1c zfs:dsl_pool_sync+82 (e1e7ab80, 5, 0)
dc99ed6c zfs:spa_sync+ef (f106b380, 5, 0)
dc99edc8 zfs:txg_sync_thread+1df (e1e7ab80, 0)
dc99edd8 unix:thread_start+8 ()
syncing file systems...
done
dumping to /dev/dsk/c0d0s1, offset 429391872, content: kernel
Problem: the "un_wm_cache" in struct sd_lun isn't allocated,
but otherwise the sd_lun structure looks ok:
>> d3f28000::print struct sd_lun un_wm_cache
un_wm_cache = 0
>> d3f28000::print struct sd_lun un_f_non_devbsize_supported
un_f_non_devbsize_supported = 0x1
>> d3f28000::print struct sd_lun un_tgt_blocksize
un_tgt_blocksize = 0x800
>> d3f28000::print struct sd_lun un_sys_blocksize
un_sys_blocksize = 0x200
>> d3f28000::print struct sd_lun un_f_has_removable_media
un_f_has_removable_media = 0x1
>> d3f28000::print struct sd_lun un_f_geometry_is_valid
un_f_geometry_is_valid = 0x1
>> d3f28000::print struct sd_lun un_state
un_state = 0
>> d3f28000::print struct sd_lun un_f_doorlock_supported
un_f_doorlock_supported = 0x1
>> d3f28000::print struct sd_lun un_errstats
un_errstats = 0xd3f31c30
>> d3f28000::print struct sd_lun un_errstats[0]
{
un_errstats->ks_crtime = 0x227660572c
un_errstats->ks_next = 0
un_errstats->ks_kid = 0x2e5
un_errstats->ks_module = [ "sderr" ]
un_errstats->ks_resv = 0
un_errstats->ks_instance = 0x4
un_errstats->ks_name = [ "sd4,err" ]
un_errstats->ks_type = 0x1
un_errstats->ks_class = [ "device_error" ]
un_errstats->ks_flags = 0x8
un_errstats->ks_data = 0xd3f31cfc
un_errstats->ks_ndata = 0xe
un_errstats->ks_data_size = 0x2a0
un_errstats->ks_snaptime = 0x8034951b1ef
un_errstats->ks_update = nulldev
un_errstats->ks_private = 0xd3f28000
un_errstats->ks_snapshot = default_kstat_snapshot
un_errstats->ks_lock = 0
}
kmem_cache_alloc is called from sd:sd_range_lock+5e, apparently
with a NULL kmem_cache_t* argument, and that should explain the panic.
>> sd_range_lock+5e-5?i
sd_range_lock+0x59: call +0x694453e <kmem_cache_alloc>
========================================================================
I was also able to reproduce this panic with a different sequence of commands:
1. stop volfs service (noone should be using the dvd-ram device)
svcadm disable -t volfs
2. dd if=/dev/rdsk/c1t1d0p0 of=/dev/rdsk/c1t1d0p0
read & write dvd-ram media, using unaligned blocks (this needs
the un_wm_cache).
Suspend the "dd" command after a while (Ctrl-Z, or pkill -STOP dd)
in sd_lun, we see an active OTYP_CHR open of partition 16, and the
un_wm_cache is allocated, as expected:
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache
un_wm_cache = 0
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap
{
un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
un_ocmap.rinfo = {
lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
reg_open = [ 0, 0, 0x10000, 0 ]
}
}
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache
un_wm_cache = 0xdf9986f0
>>
3. In another window, open another partition on the dvd-ram device:
sleep 10000000 < /dev/rdsk/c1t1d0s0
There are now two OTYP_CHR partitions/slices open; un_wm_cache
remains allocated:
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap
{
un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
un_ocmap.rinfo = {
lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
reg_open = [ 0, 0, 0x10001, 0 ]
}
}
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache
un_wm_cache = 0xdf9986f0
>>
4. Now kill the sleep command, which had the s0 partition open
(p0 slice remains opened)
We still see one active open, but the un_wm_cache is gone!
The un_wm_cache should not have been freed because it is needed
for the dd command which has the device open.
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap
{
un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
un_ocmap.rinfo = {
lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
reg_open = [ 0, 0, 0x10000, 0 ]
}
}
>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache
un_wm_cache = 0 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>
5. Now the next unaligned read/write to the p0 slice should panic the machine:
put the suspended "dd" process into the foreground, or pkill -CONT dd
=> panic
-------
Problem is in sdclose(): Note that the un_wm_cache is destroyed on
*every* sdclose() call:
if (cp == &un->un_ocmap.chkd[OCSIZE]) {
SD_TRACE(SD_LOG_OPEN_CLOSE, un, "sdclose: last close\n");
/* cleanup code on last close */
}
/*
* Destroy the cache (if it exists) which was
* allocated for the write maps since this is
* the last close for this media.
*/
if (un->un_wm_cache) {
...
kmem_cache_destroy(
un->un_wm_cache);
un->un_wm_cache = NULL;
...
}
Frequency
Always
Regression
No
Steps to Reproduce
Reproduce with:
1. stop volfs service (noone should be using the dvd-ram device)
svcadm disable -t volfs
2. dd if=/dev/rdsk/c1t1d0p0 of=/dev/rdsk/c1t1d0p0
read & write dvd-ram media, using unaligned blocks (this needs
the un_wm_cache).
Suspend the "dd" command after a while (Ctrl-Z, or pkill -STOP dd)
3. In another window, open another partition on the dvd-ram device:
sleep 10000000 < /dev/rdsk/c1t1d0s0
4. Now kill the sleep command, which had the s0 partition open
(p0 slice remains opened)
5. Now the next unaligned read/write to the p0 slice should panic the machine:
put the suspended "dd" process into the foreground, or pkill -CONT dd
=> panic
Expected Result
No panic
Actual Result
kernel panic, due to a NULL pointer dereference
Error Message(s)
BAD TRAP: type=e (#pf Page fault) rp=dc99e6ec addr=150 occurred in module "unix" due to a NULL pointer dereference
Test Case
Submitter wants to work on bug
Yes
Additional configuration information
- snv_28, bfu'ed to snv_34
- x86 32-bit kernel
- LG "HL-DT-ST DVDRAM GMA-4020B" DVD writer device
(device includes dvd-ram support)
|