OpenSolaris

Printable Version Enter a New Search
Bug ID 6340735
Synopsis write() may hang on loopback tcp connections
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords onnv_triage
Responsible Engineer Adi Masputra
Reported Against snv_26
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_32
Fixed In snv_32
Release Fixed solaris_nevada(snv_32) , solaris_10u2(s10u2_05) (Bug ID:2132987)
Related Bugs 6281836 , 6749208
Submit Date 24-October-2005
Last Update Date 28-April-2007
Description
This bug has been logged by :  xxxxx@xxxxx.COM. Please contact me if further information is needed.

During snv_26 testing in PIT, 17 testcases of cgtp.dup testsuite failed
consistently when functional option was selected. These failures didn't
occur during snv_25 testing. It failed in both ipv4 and ipv6 modes. All
these testcases failed due to a timeout when setting up the ipsec
configuration.

Note that CR 6326834 went into snv_26 and may be responsible for this
behavior :
6326834 ipsecconf -l displays IPV4 outbound policies with no remote
address twice

cgtp.dup ran on UltraSPARC-IIi-Netract, UltraSPARC-IIi-cEngine and
UltraAX-i2 hardwares and produced always the same failures.

See below an example extracted from a log file :

INFO - 1 - Starting IPSEC on machine whitsmk-04
spawn /usr/bin/rsh -l root whitsmk-04
/autohome/cgtp/1.3.1/6432functional/ws/usr/ontest/net/cgtp/utils/cgtp_RunIPSEC

	WARNING : New policy entries that are being added may
 	affect the existing connections. Existing connections
	that are not subjected to policy constraints, may be
	subjected to policy constraints because of the new
	policy. This can disrupt the communication of the
	existing connections.
ERROR - 0 - Timeout received in initialising IPSEC
ERROR - 0 - Cannot start IPSEC on machine whitsmk-04
ERROR - 0 - Problem in configuration setup
RESULT - 0 - cgtp_func_DuplicationUDP_03: TEST UNRESOLVED

The cgtp_RunIPSEC script contains the commands below, however, executed
alone on a machine which ran cgtp.dup doesn't reproduce this hanging :

/usr/sbin/ipsecconf -f
/usr/sbin/ipseckey flush
/usr/sbin/ipseckey -f /etc/inet/ipseckey.cgtp
/usr/sbin/ipsecconf -a /etc/inet/ipsecinit.conf.cgtp
echo ""
echo "*********  ipseckey.cgtp *******"
/usr/sbin/ipseckey dump
echo ""
echo "*********  ipsecconf.cgtp *******"
/usr/sbin/ipsecconf
exit 0

Looking at one testcase code I was able to reproduce it by executing the
following scripts :

bash-3.00# uname -a
SunOS whitsmk-04 5.11 snv_26 sun4u sparc SUNW,UltraSPARC-IIi-Netract
bash-3.00# more launchTest.sh
/usr/bin/rsh -l root whitsmk-04 /etc/inet/cgtp_DelIPSEC
/usr/bin/rsh -l root whitsmk-04 /etc/inet/cgtp_RunIPSEC

bash-3.00# more cgtp_DelIPSEC
#!/usr/bin/sh
/usr/sbin/ipsecconf -f

bash-3.00# more cgtp_RunIPSEC
#!/usr/bin/sh
/usr/sbin/ipsecconf -a /etc/inet/ipsecinit.conf.cgtp
/usr/sbin/ipseckey -v dump

bash-3.00# ./launchTest.sh
        WARNING : New policy entries that are being added may
        affect the existing connections. Existing connections
        that are not subjected to policy constraints, may be
        subjected to policy constraints because of the new
        policy. This can disrupt the communication of the
        existing connections.

VERBOSE ON:  Message to kernel looks like:
==========================================
Base message (version 2) type DUMP, SA type <unspecified/all>.
Message length 16 bytes, seq=1, pid=170041.
        ---> it hangs here <---

By testing different configurations I could see that this test doesn't hang if :
- cgtp_DelIPSEC is not ran before cgtp_RunIPSEC (i.e. only cgtp_RunIPSEC runs)
- cgtp_RunIPSEC script is not ran with rsh (however cgtp_DelIPSEC doesn't have to be ran with rsh)
- cgtp_DelIPSEC and cgtp_RunIPSEC scripts are merged in one script
- option -v of ipseckey command is removed

However, replacing -v option in ipseckey command by echo "" after or before ipseckey command still produce the hanging :

bash-3.00# more cgtp_RunIPSEC
#!/usr/bin/sh
/usr/sbin/ipsecconf -a /etc/inet/ipsecinit.conf.cgtp
/usr/sbin/ipseckey dump
echo ""

------------------------------------------------------------
The following configuration was used when I was performing my tests :

bash-3.00# uname -a
SunOS whitsmk-04 5.11 snv_26 sun4u sparc SUNW,UltraSPARC-IIi-Netract
bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232 index 1
        inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.156.229.158 netmask ffffff00 broadcast 129.156.229.255
        ether 8:0:20:f9:da:52
hme0: flags=2000841<UP,RUNNING,MULTICAST,IPv6> mtu 1500 index 2
        inet6 fe80::a00:20ff:fef9:da52/10
        ether 8:0:20:f9:da:52
hme0:1: flags=2080841<UP,RUNNING,MULTICAST,ADDRCONF,IPv6> mtu 1500 index 2
        inet6 2020:229::a00:20ff:fef9:da52/64
hme0:2: flags=2000841<UP,RUNNING,MULTICAST,IPv6> mtu 1500 index 2
        inet6 fe35:0:0:1::1/64
hme0:3: flags=2000841<UP,RUNNING,MULTICAST,IPv6> mtu 1500 index 2
        inet6 fe35:0:0:2::2/64
hme0:4: flags=2000841<UP,RUNNING,MULTICAST,IPv6> mtu 1500 index 2
        inet6 fe35:0:0:3::3/64
--------------------------------
bash-3.00# uname -a
SunOS whitsmk-04 5.11 snv_26 sun4u sparc SUNW,UltraSPARC-IIi-Netract
bash-3.00# more /etc/inet/ipsecinit.conf.cgtp
#
#ident  "@(#)ipsecinit.sample   1.10    05/06/08 SMI"
#
{daddr fe35:0:0:3::6}
apply {auth_algs md5 sa shared}
{daddr fe35:0:0:3::3}
permit {auth_algs md5}
--------------------------------
bash-3.00# uname -a
SunOS whitsmk-04 5.11 snv_26 sun4u sparc SUNW,UltraSPARC-IIi-Netract
bash-3.00# more /etc/inet/ipseckey.cgtp
add ah spi 0x2112 dst fe35:0:0:3::6 auth_alg md5 authkey
1234567890abcdef1234567890abcdef
add ah spi 0x2112 dst fe35:0:0:3::3 auth_alg md5 authkey
1234567890abcdef1234567890abcdef
--------------------------------
Results can be found here :
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/593501/report@593501
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/593338/report@593338
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/593337/report@593337
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/593083/report@593083
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/593084/report@593084
http://diablo.ireland/cgi-bin/electron/report.cgi?src=/electron/data/reports/03741/591951/report@591951

Machines will be made available for further testing on request.
Work Around
Turn off the tcp fusion by putting the following in /etc/system and rebooting

set ip:do_tcp_fusion=0

or

root@vha-3500a> mdb -kw
Loading modules: [ unix krtld genunix specfs dtrace ufs ssd fcp fctl ip sctp nca md zfs random ipc nfs sd fcip cpc sppp ]
> do_tcp_fusion/W 0
do_tcp_fusion:  0x1             =       0x0
>
Comments
N/A