OpenSolaris

Printable Version Enter a New Search
Bug ID 6233064
Synopsis svc.startd is wedged trying to talk to the system console
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:streams
Keywords onnv_triage | s10u1-req | smf
Responsible Engineer Mike Cheng
Reported Against s10u1_01 , s10_b74l2a
Duplicate Of
Introduced In solaris_2.0
Commit to Fix snv_21
Fixed In snv_21
Release Fixed solaris_nevada(snv_21) , solaris_10u2(s10u2_03) (Bug ID:2131161)
Related Bugs 6260325 , 6266921 , 6270710
Submit Date 25-February-2005
Last Update Date 3-April-2007
Description
Boot hangs during repeated shutdown testing.
Then we were not able to login and telnet to the system
 xxxxx@xxxxx.com 2005-2-25 08:27:13 GMT

According to the crash dump file, the boot process stopped
when svc.startd launched SMF services.
>> ::ptree
0000000001841940  sched
     0000030035edf818  fsflush
     0000030035ee0420  pageout
     0000030035ee1028  init
          00000300480b5878  scfdrvrcvd
          00000300383ab070  evhandsd
               000003003751d038  evhandsd
                    00000300376de030  evmond
          000003003839a470  fjsvdrd
          00000300383a8c58  inetd
          00000300383aa468  statd
          00000300376f1048  nfsmapid
          00000300383b0048  rpcbind
          00000300383b0c50  syswarnd
          00000300376dec38  cron
          00000300376d5058  limstrerr
          00000300383b3068  efdaemon
          00000300383bb060  pwrctrld
          00000300383ba458  in.routed
          000003003751b828  devfsadm
          00000300374e7830  nscd
          00000300376d3848  picld
          00000300376d4450  syseventd
          00000300376eec30  kcfd
          0000030035edec10  svc.configd
          0000030035ede008  svc.startd
               00000300383b8040  fjsvdr
               00000300376e0448  inetd
               0000030035ec4428  sac
                    00000300383a8050  ttymon
>> ::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R      0      0      0      0      0 0x00000001 0000000001841940 sched
R      3      0      0      0      0 0x00020001 0000030035edf818 fsflush
R      2      0      0      0      0 0x00020001 0000030035ee0420 pageout
R      1      0      0      0      0 0x42004000 0000030035ee1028 init
R   1946      1   1946   1946      0 0x42000000 00000300480b5878 scfdrvrcvd
R   1679      1   1679   1679      0 0x42010000 00000300383ab070 evhandsd
R   1681   1679   1679   1679      0 0x42010000 000003003751d038 evhandsd
R   1684   1681   1679   1679      0 0x42004000 00000300376de030 evmond
R    226      1    225    225      0 0x42030000 000003003839a470 fjsvdrd
R    224      1    224    224      0 0x42000000 00000300383a8c58 inetd
R    214      1    214    214      1 0x42000000 00000300383aa468 statd
R    213      1    213    213      1 0x52000000 00000300376f1048 nfsmapid
R    209      1    209    209      1 0x42000000 00000300383b0048 rpcbind
R    202      1    202    202      0 0x42000000 00000300383b0c50 syswarnd
R    200      1    200    200      0 0x42010000 00000300376dec38 cron
R    198      1    198    198      0 0x42000000 00000300376d5058 limstrerr
R    182      1    182    182      0 0x42000000 00000300383b3068 efdaemon
R    179      1    179    179      0 0x42000000 00000300383bb060 pwrctrld
R    133      1    132    132      0 0x42000000 00000300383ba458 in.routed
R    119      1    119    119      0 0x42000000 000003003751b828 devfsadm
R    113      1    113    113      0 0x42000000 00000300374e7830 nscd
R    103      1    103    103      0 0x42000000 00000300376d3848 picld
R     97      1     97     97      0 0x42000000 00000300376d4450 syseventd
R     89      1     89     89      1 0x42000000 00000300376eec30 kcfd
R      9      1      9      9      0 0x42000000 0000030035edec10 svc.configd
R      7      1      7      7      0 0x42000000 0000030035ede008 svc.startd
Z    217      7    217    217      0 0x42004000 00000300383b8040 fjsvdr
Z    216      7      7      7      0 0x42004000 00000300376e0448 inetd
R    215      7    215    215      0 0x42014000 0000030035ec4428 sac
R    221    215    215    215      0 0x42014000 00000300383a8050 ttymon

fjsvdr(pid=217) and inetd(pid= 216) is Zombie and they were
not removed by svc.startd.

A system has two ttymon process normally.
But ttymon had been launched only one process yet
in this system. So, login prompt was not displayed.

Threads of inetd(1M) is as follows.

 thread_id=30037669500(TS_SLEEP), Sleeping for 50735[sec](50735420[m])
 proc=300383a8c58(/usr/lib/inet/inetd start) slot=4 pid=224
 wchan get failed !!
 wchan; 0 ->        0.        0
 door_return + 1e4 <- syscall_trap32 + cc <-

 thread_id=3004401a560(TS_SLEEP), Sleeping for 50735[sec](50735390[m])
 proc=300383a8c58(/usr/lib/inet/inetd start) slot=4 pid=224
 last lbolt on proc 50736420[msec] before
 wchan get failed !!
 wchan; 0 ->        0.        0
 door_return + 1bc <- syscall_trap32 + cc <-

 thread_id=30037669b80(TS_SLEEP), Sleeping for 5[sec](5720[m])
 proc=300383a8c58(/usr/lib/inet/inetd start) slot=4 pid=224
 wchan; 30037669d26 ->    10000.        0
 cv_timedwait_sig + 16c <- cv_waituntil_sig + 8c <-
 lwp_park + 130 <- syslwp_park + 30 <- syscall_trap32 + cc <-

 thread_id=300376c2840(TS_SLEEP), Sleeping for 50735[sec](50735390[m])
 proc=300383a8c58(/usr/lib/inet/inetd start) slot=4 pid=224
 last lbolt on proc 50735400[msec] before
 wchan; 30044cc5922 ->    10000.        0
 cv_wait_sig_swap_core + 130 <- poll_common + 4e8 <-
 pollsys + f8 <- syscall_trap32 + cc <-

inetd(1M) didn't recieve a request of telnet
because inetd(1M) slept for 50735 seconds(14 hours)
in poll(2).

This system has 120 CPUs and 480GB MEM.
Isn't there possibility that such a large system
cause the trouble of SMF?
Core Files and library files uploaded in 
/net/heavenly.japan/export7/dts/18300/18300-1.tar.gz
/net/heavenly.japan/export7/dts/18300
-rwxrwxrwx   1 am151520 staff    24160313 Feb 25 15:48 18300-1.tar.gz
 xxxxx@xxxxx.com 2005-2-25 08:27:13 GMT
sudheer.abdul- xxxxx@xxxxx.com 2005-04-01 15:46:18 GMT
Work Around
N/A
Comments
N/A