OpenSolaris

Printable Version Enter a New Search
Bug ID 6213273
Synopsis Hang in i_devi_enter from fcip detach ddi_remove_minor_node call
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:ddi
Keywords
Responsible Engineer Chris Horne
Reported Against snv_21 , 3.1u4_27 , s10_70l1 , 3.1u4_fcs , s10_b74l2a
Duplicate Of
Introduced In solaris_10
Commit to Fix snv_24
Fixed In snv_24
Release Fixed solaris_nevada(snv_24) , solaris_10u3(s10u3_03) (Bug ID:2138619)
Related Bugs 6276452 , 6317900 , 6334926 , 6354116 , 6456352
Submit Date 1-January-2005
Last Update Date 17-October-2005
Description
I had a machine hang while running tests (modbash/devicesbash).
It looks like the hang occured because the following
thread hung while detaching and holding it's parent devinfo
node /devices/pseudo busy.

[1]> 2a100f75cc0::findstack -v
stack pointer for thread 2a100f75cc0: 2a100f74ba1
[ 000002a100f74ba1 cv_wait+0x38() ]
  000002a100f74c51 i_devi_enter+0x30(300007db448, 400000, 400000, 1, 300007db59c, 0)
  000002a100f74d01 ddi_remove_minor_node+0x20(300007db4b0, 0, 2, 300007db448, 7b3619a8, 0)
  000002a100f74db1 detach_node+0x12c(18a8000, 8000000, 0, 10420000, 300007db4b0, 300007db448)
  000002a100f74e61 i_ndi_unconfig_node+0x110(300007db448, 11c, 8000000, 10ac08c, 14, 10ac000)
  000002a100f74f11 i_ddi_detachchild+0x20(300007db448, 8000000, 1834340, 300003e2938, 1000, 2a100f75cc6)
  000002a100f74fd1 devi_detach_node+0x6c(300007db448, 8000000, 0, 300000779c8, 80000, 8000000)
  000002a100f75091 unconfig_immediate_children+0x98(300003e2938, 0, 300007db448, f4, 2000, 8000000)
  000002a100f75151 devi_unconfig_common+0x1a8(300003e2938, 0, 6, 0, 0, f4)
  000002a100f75211 mt_config_thread+0xac(3001cb29bc0, 0, 1834340, 1834340, 300003e2938, 30000f0b600)
  000002a100f752d1 thread_start+4(3001cb29bc0, 0, ca5a202092100008, 
  d00da030a401401b, e4726020d02a6030, c85da02086210017)
[1]> 300007db448::devinfo -s
DEVINFO            MAJ           REFCNT NODENAME             NODESTATE
                  INST         CIRCULAR BINDNAME             STATE
                                 THREAD                      FLAGS          
00000300007db448   244                0 fcip@0               DS_ATTACH
                     0                0 fcip                 <S_DETACHING
,S_MD_UPDATE,S_EVADD>
                                      0                      <>
[1]> 00000300007db448::print struct dev_info devi_state
devi_state = 0x10420000		<<< DEVI_S_MD_UPDATE set
[1]> 
[1]> 300007db448::prtconf
DEVINFO          NAME                                              
300003ebd08      SUNW,Ultra-4
    300003e2938      pseudo, instance #0
        300007db448      fcip, instance #0
[1]>

I have attached a core taken by breaking into the
debugger are forcing a core while the machine was hung.
A possible culprit would be manipulation of devi_state
without holding devi_lock.

This failure occured on springfield.central
   xxxxx  Enterprise 450 (2 X UltraSPARC-II 248MHz), No Keyboard
  OpenBoot 3.26, 256 MB memory installed, Serial #12876880.

It seems like an evaluation of all DEVI_SET_* DEVI_CLR_*
calles is needed - it is not aparent (to me) that
mutex_owned(&(DEVI(dip)->devi_lock) is being used
consistently to protect devi_state manipulation.
Also, some usb code does not take devi_lock prior to SET/CLR.



 xxxxx@xxxxx.com 2005-1-01 00:15:15 GMT
I have had this hang occur two more times (after ~8 hours of testing)
on fatboy.central (16way sun4u) with modbash/devicesbash testing.
Work Around
N/A
Comments
N/A