OpenSolaris

Printable Version Enter a New Search
Bug ID 6709590
Synopsis race between tcp_fuse_output and ifconfig down panics with NULL conn_ire_cache
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords rtiq_regression
Responsible Engineer Brian Ruthven
Reported Against s10u4_fcs
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_102
Fixed In snv_102
Release Fixed solaris_nevada(snv_102) , solaris_10u7(s10u7_03) (Bug ID:2169421)
Related Bugs 6418698 , 6522934 , 6782285 , 6833299
Submit Date 2-June-2008
Last Update Date 6-July-2009
Description
Customer's system (S10U4 + 127111-10, 9 non-global zones, using ipfilter
also for loopback interface) paniced in routine ip:tcp_fuse_output().

==== panic thread: 0x31daf8f71a0 ==== CPU: 17 ====
==== panic user (LWP_SYS) thread: 0x31daf8f71a0  PID: 5567  on CPU: 17 ====
cmd: java -server -Xms256M -Xmx512M -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rm
t_procp: 0x60003bde708
  p_as: 0x60002285d18  size: 787472384  RSS: 150241280
  hat: 0x30099c61900  cnum: 0x84  cpusran: 0,1,2,3,16,17,18,19
  zone: kina
t_stk: 0x2a108715ae0  sp: 0x1843581  t_stkbase: 0x2a108710000
t_pri: 29(TS)  pctcpu: 0.008886
t_lwp: 0x31d4e6ed5f0  machpcb: 0x2a108715ae0
  mstate: LMS_SYSTEM  ms_prev: LMS_KFAULT
  ms_state_start: 0.0000942 seconds earlier
  ms_start: 5 days 21 hours 41 minutes 16.8409132 seconds earlier
psrset: 0  last CPU: 17
idle: 10001 ticks (10.001 seconds)
start: Tue Apr 15 08:42:20 2008
age: 510057 seconds (5 days 21 hours 40 minutes 57 seconds)
syscall: #240 send(, 0xd097ed98) (sysent: unix:send32+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_PANIC - thread initiated a system panic
        T_DFLTSTK - stack is default size
tpflg:  TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SMSACCT - process is keeping micro-state accounting
        SMSFORK - child inherits micro-state accounting

pc:      0x10608ec      unix:panicsys+0x48:   call      unix:setjmp

unix:panicsys+0x48(0x107c890, 0x2a1087151a8, 0x1843f50, 0x1, , , 0x9900001601, , , , , , , , 0x107c890, 0x2a1087151a8)
unix:vpanic_common+0x78(0x107c890, 0x2a1087151a8, 0xb, 0x80000000, 0x7ffd183d, 0x80000000)
unix:panic+0x1c(0x107c890, 0x31, 0x2a108715400, 0x78, 0x0, 0x600000e6c40, 0x181a588)
unix:die+0x78(0x31, 0x2a108715400, 0x78, 0x0)
unix:trap+0x9d4(0x2a108715400, 0x78)
unix:ktl0+0x48()
-- trap data  type: 0x31 (data access MMU miss)  rp: 0x2a108715400  --
  addr: 0x78
pc:  0x7bedf78c ip:tcp_fuse_output+0x1b8:   ldx [%o3 + 0x78], %g1
npc: 0x7bedf790 ip:tcp_fuse_output+0x1bc:   ldx   [%g1 + 0x8], %i2
  global:                       %g1           0x22cdac
        %g2      0x30072809b80  %g3               0x22
        %g4          0x18bf800  %g5         0x20059200
        %g6                  0  %g7      0x31daf8f71a0
  out:  %o0      0x364cb623460  %o1                  0
        %o2                  0  %o3                  0
        %o4             0xfc00  %o5      0x364cb623460
        %sp      0x2a108714ca1  %o7      0x364cb623460
  loc:  %l0             0x1800  %l1                0x1
        %l2      0x300558ee000  %l3      0x3005567c000
        %l4             0x1ff4  %l5                0x8
        %l6                  0  %l7                  0
  in:   %i0                  0  %i1      0x364cb623c40
        %i2                0x1  %i3                0x4
        %i4      0x35af97f7fc0  %i5      0x30072809d80
        %fp      0x2a108714e11  %i7         0x7becf958
<trap>ip:tcp_fuse_output+0x1b8(0x35af97f7fc0, 0x364cb623c40, 0x1)
ip:tcp_output+0x74(0x35af97f7dc0, 0x364cb623c40, 0x6000014bf00)
ip:squeue_enter+0x74()
ip:tcp_wput(0x600030343b0, 0x364cb623c40) - frame recycled
sockfs:sostream_direct+0x190(0x600035da018, 0x2a108715aa0, 0x0, 0x30065f6f738?)
sockfs:sotpi_sendmsg+0x4e8(0x600035da018, 0x2a108715a70, 0x2a108715aa0)
sockfs:sendit+0x134(, 0x2a108715a70, 0x2a108715aa0, 0x8)
sockfs:send+0x60(, 0xd097ee58)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --

Analysis of the issue suggests that this panic was triggered by an
IPMP failover that occured while at the same time there were local
tcp connections established using one of the to-be-failed-over IP
addresses.

In this case - during the operation of moving the IP address to another
physical interface - the conn_t structures of all tcp connections using
that IP address are being cleaned up to no longer used the respective
cached IRE (conn_ire_cache).

However, the packet filter hooks seem to expect that conn_ire_cache
is always valid (i.e. non-NULL) for loopback connections and thus
the system panics when hitting the case where the conn_t structure
has element conn_ire_cache == NULL.
This is probably a bug introduced by the Packet Filtering Hooks
project.  The TCP fusion code always assumes that it does not
need to grab the conn_lock since it does not touch anything in
conn_t which can be changed.  The filter hooks break this
assumption.

Also note that 6522934 introduced a check on conn_ire_cache.
But that is not enough to fix this issue.  Refer to the comments
in tcp_loopback_needs_ip().
I'm tempted to suggest that clearing out conn_ire_cache is not done in tcp_closei_local(), but maybe that is just moving around the deck chairs and not fixing the problem.

Othersise, it would seem unavoidable to need to get conn_lock in tcp_fuse_output() so that we can do ire_to_ill() on the cached ire and get the ill_index, dropping conn_lock after doing so and before calling pfhooks.
The current issue is not about tcp_closei_local() clearing
conn_ire_cache.  This should not be a problem since the
fusion should be already torn apart at that time.  The issue
is that outside TCP, the conn_ire_cache can be cleared,
as in the case in this CR.
So one way to approach this would be to lock the conn_t early in tcp_fuse_output(), store conn_ire_cache in a local pointer, increment the reference count on the ire (if it is non-NULL) and then release the conn_t lock.

The catch is what to do if the ire is NULL when we get to this point - it seems like the most obvious condition that causes this race is one end closing up shop.

Ignore the comments I made earlier, the obvious thing to do, in that case, is punt the processing back to unfused.

Note that it isn't sufficient to "just add another check" of the conn_ire_cache pointer, it must be copied and held using IRE_REFHOLD.

e.g. maybe this is the last ire_t associated with an ill_t that is being removed...
Work Around
disable ipfilter, at least for loopback connections
(set intercept_loopback to false in ipf.conf).
Or disable TCP fusion by setting do_tcp_fusion to 0
in /etc/system and reboot the machine.
Comments
N/A