OpenSolaris

Printable Version Enter a New Search
Bug ID 6790310
Synopsis in.mpathd may core with "phyint_inst_timer: invalid state 4"
State 10-Fix Delivered (Fix available in build)
Category:Subcategory network:ipmp
Keywords
Responsible Engineer Peter Memishian
Reported Against
Duplicate Of
Introduced In solaris_9
Commit to Fix snv_107
Fixed In snv_107
Release Fixed solaris_nevada(snv_107)
Related Bugs 4347223
Submit Date 6-January-2009
Last Update Date 28-January-2009
Description
During IPMP stress testing, we'd occasionally see in.mpathd crash with the 
following message: 
 
   in.mpathd[100391]: phyint_inst_timer: invalid state 4 
 
This means that we're attempting to send probes through a phyint in the 
PI_OFFLINE (4) state, which should never happen.  After instrumenting 
in.mpathd to provide CTF data and digging through the source, the issue 
became clear: when select_test_ifs() is called (to find a test address to 
use for probing), it's possible that IFF_OFFLINE has been cleared and a 
test address has been brought IFF_UP, but that the phyint itself is still 
PI_OFFLINE. This could happen because an external program changed
IFF_OFFLINE, or because setting the flags via SIOCSLIFFLAGS itself is not
atomic and the IFF_OFFLINE flag got lost in the process.
 
Indeed, it's easy to prove this theory by writing a small program that 
clears the IFF_OFFLINE flag but does not tell in.mpathd to bring the 
interface online.  For instance, on Nevada we configure a two interface 
group, assign a test address to under1 and take it offline, resulting in: 
 
# ifconfig -a 
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
        inet 127.0.0.1 netmask ff000000  
under0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 
        inet 10.8.57.34 netmask ffffff00 broadcast 10.8.57.255 
        groupname a 
        ether 0:3:ba:94:3b:74  
under1: flags=289000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,OFFLINE,CoS> 
        inet 10.8.57.202 netmask ffffff00 broadcast 10.8.57.255 
        groupname a 
        ether 0:3:ba:94:3b:75  
 
We then clear offline and bring under1 back up: 
 
# /tmp/clear-offline under1 
# ifconfig under1 up 
Jan  5 22:14:04 purple-198 in.mpathd[100391]: phyint_inst_timer: invalid 
state 4 
#
Work Around
N/A
Comments
N/A