|
Description
|
The "ifconfig qfe0 inet6 up" hung without apparent reason for several hours
during ipmp testsuite execution. I eventually introduced the core dump so that
the cause can be looked for.
The ifconfig hang happened after more than 30 executions of ipmp on s10_52. The
test was ran on two separate machines (Ultra2 and SunBlade-1000), of which the
Ultra2 showed the failure. We have never experienced an "ifconfig up" failure
during ipmp on previous builds of s10 but given the low probability of occurence
on s10_52 this may be due to a chance.
For the core dump and additional details please see below!
Symptoms:
03:21:36: ti2standby6: PASS
03:21:36: ti2standby6: Clean up
03:21:36: ti2standby6: Bring down and unplumb interface qfe0 (v6)
03:21:36: ti2standby6: Bring down and unplumb interface qfe1 (v6)
03:21:37: ti2standby6: Kill in.mpathd.
03:21:37: ti2standby6: ends (returning 0)
03:21:38: ti3bothfover6: begins
03:21:38: ti3bothfover6: Kill in.mpathd.
03:21:38: ti3bothfover6: Set up 3 interfaces: 1 2 3
03:21:38: ti3bothfover6: Plumb and configure qfe0 inet6
<hang here for more than 5 hours after which the machine was brought down>
In the machine:
# ps -ef | grep ifconfig
root 116954 116924 0 03:21:39 console 0:00 ifconfig qfe0 inet6 up
root 117036 117032 0 08:59:47 pts/1 0:00 grep ifconfig
# ptree 116954
100408 -sh
100444 /bin/ksh ./wrapper
100656 /bin/ksh /earth/domain_scripts/bin/runit ipmp 2 64+64+-/64+64+-/32+32
103118 /bin/sh /net/diablo.ireland/gates/Production/S10/links/ipmp/1.10.X/
103519 make results
104049 sh -ce cd stc_files; make --e execute
104050 make --e execute
104054 /bin/ksh -p ./stc_runtest
116924 /bin/ksh -p ./ti3bothfover6
116954 ifconfig qfe0 inet6 up
# truss -p 116954
ioctl(5, SIOCSLIFFLAGS, 0xFFBFE4C8) (sleeping...)
^C# truss -p 116924
read(65, 0xFFBFBE18, 1024) (sleeping...)
^C# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 129.156.229.19 netmask ffffff00 broadcast 129.156.229.255
ether 8:0:20:9b:e9:20
^C#
(had to ctrl-c to get out)
There were no console messages for ifconfig (The ifconfig process started at 03:21:39):
Jan 30 03:21:11 delenn in.ndpd[103464]: Interface qfe0 has been removed from ker
nel. in.ndpd will no longer use it??
Jan 30 03:21:11 delenn in.ndpd[103464]: Interface qfe1 has been removed from ker
nel. in.ndpd will no longer use it??
Jan 30 03:21:36 delenn in.ndpd[103464]: Interface qfe0 has been removed from ker
nel. in.ndpd will no longer use it??
Jan 30 03:21:36 delenn in.ndpd[103464]: phyint_init_from_k: ioctl (get flags) (i
nterface qfe1): No such device or address??
Jan 30 03:21:36 delenn in.ndpd[103464]: Interface qfe1 has been removed from ker
nel. in.ndpd will no longer use it??
<Console messages stopped here>
The routing setup is as follows and looks normal:
# netstat -nr
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
129.156.229.0 129.156.229.19 U 1 109 hme0
224.0.0.0 129.156.229.19 U 1 0 hme0
default 129.156.229.1 UG 1 0 hme0
127.0.0.1 127.0.0.1 UH 5 441 lo0
I took the machine down to ok prompt and dumped the kernel. It is availble in
****
/net/diablo.ireland/export/crash/s10_52/ipmp
****
mdb shows that ifconfig was indeed still running at the time of dump so finding
reason for the hang should be possible:
# mdb unix.0 vmcore.0
Loading modules: [ unix krtld genunix sd ip nca nfs random logindmux ptm cpc ]
> ::ptree
000000000185d1d0 sched
00000300010bc008 fsflush
00000300010bca88 pageout
00000300010bd508 init
0000030003d2e070 in.ndpd
00000300023a8ac8 sshd
0000030003d48068 snmpd
0000030003d48ae8 sh
0000030003a14ad8 wrapper
0000030003d49568 runit
00000300023a8048 doipmp.client
00000300042c8080 make
00000300041ceaf8 sh
00000300042c8b00 make
000003000b970a88 stc_runtest
00000300041ce078 tee
00000300042c1500 egrep
00000300042c0000 tee
0000030003d2eaf0
ti3bothfover6
000003000b971508
ifconfig
|