OpenSolaris

Printable Version Enter a New Search
Bug ID 6388096
Synopsis NULL pointer dereference panic in sd_range_lock()
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:driver-sd-rmedia
Keywords opensolaris | oss-request | oss-sponsor
Sponsor
Submitter jk
Responsible Engineer Minskey Guo
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_36
Fixed In snv_36
Release Fixed solaris_nevada(snv_36) , solaris_10u2(s10u2_09) (Bug ID:2134951)
Related Bugs
Submit Date 21-February-2006
Last Update Date 6-November-2008
Description
Category
   kernel
Sub-Category
   driver-sd-fixed
Description
   I tried to create a zfs filesystem on the dvd-ram drive, using:
   zpool create -f dvdram c1t1d0
This paniced the machine.
panic[cpu0]/thread=dc99ede0:
BAD TRAP: type=e (#pf Page fault) rp=dc99e6ec addr=150 occurred in module "unix" due to a NULL pointer dereference
sched:
#pf Page fault
Bad kernel fault at addr=0x150
pid=0, pc=0xfe82b18d, sp=0xfe8e1447, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6d8<xmme,fxsr,pge,mce,pse,de>
cr2: 150 cr3: 6699000
         gs: fea401b0  fs: d0460000  es: d6e10160  ds:    10160
        edi:      150 esi:        0 ebp: dc99e764 esp: dc99e724
        ebx:        0 edx: dc99ede0 ecx:      150 eax:        0
        trp:        e err:        2 eip: fe82b18d  cs:      158
        efl:    10246 usp: fe8e1447  ss:      150
dc99e63c unix:die+98 (e, dc99e6ec, 150, 0)
dc99e6d8 unix:trap+11b9 (dc99e6ec, 150, 0)
dc99e6ec unix:_cmntrap+9f (fea401b0, d0460000,)
dc99e764 unix:mutex_enter+d (0, 1)
dc99e79c sd:sd_range_lock+5e (d3f28000, 4800, 480)
dc99e7e0 sd:sd_mapblocksize_iostart+106 (6, d3f28000, e8506b)
dc99e820 sd:sd_mapblockaddr_iostart+125 (5, d3f28000, e8506b)
dc99e840 sd:sd_xbuf_strategy+30 (e8506b58, d3b458d0,)
dc99e870 sd:xbuf_iostart+de (d17893a0)
dc99e888 sd:ddi_xbuf_qstrategy+4b (e8506b58, d17893a0)
dc99e8ac sd:sdstrategy+d3 (e8506b58)
dc99e8c0 genunix:bdev_strategy+4d (e8506b58)
dc99e8d4 genunix:ldi_strategy+40 (eac6ab80, e8506b58)
dc99e904 zfs:vdev_disk_io_start+1c7 (d0a42000, 0, dc99e9)
dc99e914 zfs:vdev_io_start+18 (d0a42000, 0, dc99e9)
dc99e92c zfs:zio_vdev_io_start+e (d0a42000)
dc99e940 zfs:zio_next_stage+73 (d0a42000)
dc99e95c zfs:zio_vdev_io_setup+75 (d0a42000)
dc99e974 zfs:zio_next_stage+73 (d0a42000)
dc99e99c zfs:zio_dva_translate+86 (d0a42000)
dc99e9b8 zfs:zio_next_stage+73 (d0a42000)
dc99e9dc zfs:zio_ready+3a (d0a42000)
dc99e9f4 zfs:zio_next_stage+73 (d0a42000)
dc99ea1c zfs:zio_dva_allocate+a6 (d0a42000)
dc99ea38 zfs:zio_next_stage+73 (d0a42000)
dc99ea5c zfs:zio_checksum_generate+5e (d0a42000)
dc99ea74 zfs:zio_next_stage+73 (d0a42000)
dc99eacc zfs:zio_write_compress+236 (d0a42000)
dc99eaec zfs:zio_next_stage+73 (d0a42000)
dc99eb0c zfs:zio_wait_for_children+58 (d0a42000, 1, d0a421)
dc99eb20 zfs:zio_wait_children_ready+18 (d0a42000)
dc99eb34 zfs:zio_next_stage_async+b9 (d0a42000, 200, 0, f)
dc99eb4c zfs:zio_nowait+e (d0a42000)
dc99eb60 zfs:arc_write+6d (ec49e600, f106b380,)
dc99ebec zfs:dbuf_sync+555 (e4962c18, ec49e600,)
dc99ec4c zfs:dnode_sync+350 (e841ab10, 0, ec49e6)
dc99ec80 zfs:dmu_objset_sync_dnodes+7e (e5818980, e5818a4c,)
dc99ecb8 zfs:dmu_objset_sync+5d (e5818980, d8808eb8)
dc99ecd0 zfs:dsl_dataset_sync+17 (e1e08500, d8808eb8)
dc99ed1c zfs:dsl_pool_sync+82 (e1e7ab80, 5, 0)
dc99ed6c zfs:spa_sync+ef (f106b380, 5, 0)
dc99edc8 zfs:txg_sync_thread+1df (e1e7ab80, 0)
dc99edd8 unix:thread_start+8 ()
syncing file systems...
 done
dumping to /dev/dsk/c0d0s1, offset 429391872, content: kernel
Problem: the "un_wm_cache" in struct sd_lun isn't allocated,
but otherwise the sd_lun structure looks ok:

>> d3f28000::print struct sd_lun un_wm_cache

un_wm_cache = 0

>> d3f28000::print struct sd_lun un_f_non_devbsize_supported

un_f_non_devbsize_supported = 0x1

>> d3f28000::print struct sd_lun un_tgt_blocksize

un_tgt_blocksize = 0x800

>> d3f28000::print struct sd_lun un_sys_blocksize

un_sys_blocksize = 0x200

>> d3f28000::print struct sd_lun un_f_has_removable_media

un_f_has_removable_media = 0x1

>> d3f28000::print struct sd_lun un_f_geometry_is_valid

un_f_geometry_is_valid = 0x1

>> d3f28000::print struct sd_lun un_state

un_state = 0

>> d3f28000::print struct sd_lun un_f_doorlock_supported

un_f_doorlock_supported = 0x1

>> d3f28000::print struct sd_lun un_errstats

un_errstats = 0xd3f31c30

>> d3f28000::print struct sd_lun un_errstats[0]

{
    un_errstats->ks_crtime = 0x227660572c
    un_errstats->ks_next = 0
    un_errstats->ks_kid = 0x2e5
    un_errstats->ks_module = [ "sderr" ]
    un_errstats->ks_resv = 0
    un_errstats->ks_instance = 0x4
    un_errstats->ks_name = [ "sd4,err" ]
    un_errstats->ks_type = 0x1
    un_errstats->ks_class = [ "device_error" ]
    un_errstats->ks_flags = 0x8
    un_errstats->ks_data = 0xd3f31cfc
    un_errstats->ks_ndata = 0xe
    un_errstats->ks_data_size = 0x2a0
    un_errstats->ks_snaptime = 0x8034951b1ef
    un_errstats->ks_update = nulldev
    un_errstats->ks_private = 0xd3f28000
    un_errstats->ks_snapshot = default_kstat_snapshot
    un_errstats->ks_lock = 0
}
kmem_cache_alloc is called from sd:sd_range_lock+5e, apparently
with a NULL kmem_cache_t* argument, and that should explain the panic.

>> sd_range_lock+5e-5?i

sd_range_lock+0x59:             call   +0x694453e       <kmem_cache_alloc>
========================================================================
I was also able to reproduce this panic with a different sequence of commands:
1. stop volfs service (noone should be using the dvd-ram device)
   svcadm disable -t volfs
2. dd if=/dev/rdsk/c1t1d0p0 of=/dev/rdsk/c1t1d0p0
   read & write dvd-ram media, using unaligned blocks (this needs
   the un_wm_cache).
   Suspend the "dd" command after a while (Ctrl-Z, or pkill -STOP dd)
   in sd_lun, we see an active OTYP_CHR open of partition 16, and the
   un_wm_cache is allocated, as expected:

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache

un_wm_cache = 0

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap

{
    un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
    un_ocmap.rinfo = {
        lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
        reg_open = [ 0, 0, 0x10000, 0 ]
    }
}

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache

un_wm_cache = 0xdf9986f0

>>

3. In another window, open another partition on the dvd-ram device:
   sleep 10000000 < /dev/rdsk/c1t1d0s0
There are now two OTYP_CHR partitions/slices open; un_wm_cache
remains allocated:

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap

{
    un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
    un_ocmap.rinfo = {
        lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
        reg_open = [ 0, 0, 0x10001, 0 ]
    }
}

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache

un_wm_cache = 0xdf9986f0

>>

4. Now kill the sleep command, which had the s0 partition open
   (p0 slice remains opened)
We still see one active open, but the un_wm_cache is gone!
The un_wm_cache should not have been freed because it is needed
for the dd command which has the device open.

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_ocmap

{
    un_ocmap.chkd = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
    un_ocmap.rinfo = {
        lyr_open = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
        reg_open = [ 0, 0, 0x10000, 0 ]
    }
}

>> d014c920::print struct dev_info devi_driver_data|::print struct scsi_device sd_private|::print struct sd_lun un_wm_cache

un_wm_cache = 0             <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>>

5. Now the next unaligned read/write to the p0 slice should panic the machine:
   put the suspended "dd" process into the foreground, or  pkill -CONT dd
   => panic
      -------
Problem is in sdclose():  Note that the un_wm_cache is destroyed on
*every* sdclose() call:
        if (cp == &un->un_ocmap.chkd[OCSIZE]) {
                SD_TRACE(SD_LOG_OPEN_CLOSE, un, "sdclose: last close\n");
		/* cleanup code on last close */
	}
        /*
         * Destroy the cache (if it exists) which was
         * allocated for the write maps since this is
         * the last close for this media.
         */
        if (un->un_wm_cache) {
		...
                        kmem_cache_destroy(
                            un->un_wm_cache);
                        un->un_wm_cache = NULL;
                ...
	}
Frequency
   Always
Regression
   No
Steps to Reproduce
   Reproduce with:
1. stop volfs service (noone should be using the dvd-ram device)
   svcadm disable -t volfs
2. dd if=/dev/rdsk/c1t1d0p0 of=/dev/rdsk/c1t1d0p0
   read & write dvd-ram media, using unaligned blocks (this needs
   the un_wm_cache).
   Suspend the "dd" command after a while (Ctrl-Z, or pkill -STOP dd)
3. In another window, open another partition on the dvd-ram device:
   sleep 10000000 < /dev/rdsk/c1t1d0s0
4. Now kill the sleep command, which had the s0 partition open
   (p0 slice remains opened)
5. Now the next unaligned read/write to the p0 slice should panic the machine:
   put the suspended "dd" process into the foreground, or  pkill -CONT dd
   => panic
Expected Result
   No panic
Actual Result
   kernel panic, due to a NULL pointer dereference
Error Message(s)
   BAD TRAP: type=e (#pf Page fault) rp=dc99e6ec addr=150 occurred in module "unix" due to a NULL pointer dereference
Test Case
   Submitter wants to work on bug
   Yes
Additional configuration information
   - snv_28, bfu'ed to snv_34
- x86 32-bit kernel
- LG "HL-DT-ST DVDRAM GMA-4020B" DVD writer device
  (device includes dvd-ram support)
Work Around
N/A
Comments
N/A