|
Description
|
Customer's system (S10U4 + 127111-10, 9 non-global zones, using ipfilter
also for loopback interface) paniced in routine ip:tcp_fuse_output().
==== panic thread: 0x31daf8f71a0 ==== CPU: 17 ====
==== panic user (LWP_SYS) thread: 0x31daf8f71a0 PID: 5567 on CPU: 17 ====
cmd: java -server -Xms256M -Xmx512M -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rm
t_procp: 0x60003bde708
p_as: 0x60002285d18 size: 787472384 RSS: 150241280
hat: 0x30099c61900 cnum: 0x84 cpusran: 0,1,2,3,16,17,18,19
zone: kina
t_stk: 0x2a108715ae0 sp: 0x1843581 t_stkbase: 0x2a108710000
t_pri: 29(TS) pctcpu: 0.008886
t_lwp: 0x31d4e6ed5f0 machpcb: 0x2a108715ae0
mstate: LMS_SYSTEM ms_prev: LMS_KFAULT
ms_state_start: 0.0000942 seconds earlier
ms_start: 5 days 21 hours 41 minutes 16.8409132 seconds earlier
psrset: 0 last CPU: 17
idle: 10001 ticks (10.001 seconds)
start: Tue Apr 15 08:42:20 2008
age: 510057 seconds (5 days 21 hours 40 minutes 57 seconds)
syscall: #240 send(, 0xd097ed98) (sysent: unix:send32+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
T_DFLTSTK - stack is default size
tpflg: TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: 0x10608ec unix:panicsys+0x48: call unix:setjmp
unix:panicsys+0x48(0x107c890, 0x2a1087151a8, 0x1843f50, 0x1, , , 0x9900001601, , , , , , , , 0x107c890, 0x2a1087151a8)
unix:vpanic_common+0x78(0x107c890, 0x2a1087151a8, 0xb, 0x80000000, 0x7ffd183d, 0x80000000)
unix:panic+0x1c(0x107c890, 0x31, 0x2a108715400, 0x78, 0x0, 0x600000e6c40, 0x181a588)
unix:die+0x78(0x31, 0x2a108715400, 0x78, 0x0)
unix:trap+0x9d4(0x2a108715400, 0x78)
unix:ktl0+0x48()
-- trap data type: 0x31 (data access MMU miss) rp: 0x2a108715400 --
addr: 0x78
pc: 0x7bedf78c ip:tcp_fuse_output+0x1b8: ldx [%o3 + 0x78], %g1
npc: 0x7bedf790 ip:tcp_fuse_output+0x1bc: ldx [%g1 + 0x8], %i2
global: %g1 0x22cdac
%g2 0x30072809b80 %g3 0x22
%g4 0x18bf800 %g5 0x20059200
%g6 0 %g7 0x31daf8f71a0
out: %o0 0x364cb623460 %o1 0
%o2 0 %o3 0
%o4 0xfc00 %o5 0x364cb623460
%sp 0x2a108714ca1 %o7 0x364cb623460
loc: %l0 0x1800 %l1 0x1
%l2 0x300558ee000 %l3 0x3005567c000
%l4 0x1ff4 %l5 0x8
%l6 0 %l7 0
in: %i0 0 %i1 0x364cb623c40
%i2 0x1 %i3 0x4
%i4 0x35af97f7fc0 %i5 0x30072809d80
%fp 0x2a108714e11 %i7 0x7becf958
<trap>ip:tcp_fuse_output+0x1b8(0x35af97f7fc0, 0x364cb623c40, 0x1)
ip:tcp_output+0x74(0x35af97f7dc0, 0x364cb623c40, 0x6000014bf00)
ip:squeue_enter+0x74()
ip:tcp_wput(0x600030343b0, 0x364cb623c40) - frame recycled
sockfs:sostream_direct+0x190(0x600035da018, 0x2a108715aa0, 0x0, 0x30065f6f738?)
sockfs:sotpi_sendmsg+0x4e8(0x600035da018, 0x2a108715a70, 0x2a108715aa0)
sockfs:sendit+0x134(, 0x2a108715a70, 0x2a108715aa0, 0x8)
sockfs:send+0x60(, 0xd097ee58)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --
Analysis of the issue suggests that this panic was triggered by an
IPMP failover that occured while at the same time there were local
tcp connections established using one of the to-be-failed-over IP
addresses.
In this case - during the operation of moving the IP address to another
physical interface - the conn_t structures of all tcp connections using
that IP address are being cleaned up to no longer used the respective
cached IRE (conn_ire_cache).
However, the packet filter hooks seem to expect that conn_ire_cache
is always valid (i.e. non-NULL) for loopback connections and thus
the system panics when hitting the case where the conn_t structure
has element conn_ire_cache == NULL.
This is probably a bug introduced by the Packet Filtering Hooks
project. The TCP fusion code always assumes that it does not
need to grab the conn_lock since it does not touch anything in
conn_t which can be changed. The filter hooks break this
assumption.
Also note that 6522934 introduced a check on conn_ire_cache.
But that is not enough to fix this issue. Refer to the comments
in tcp_loopback_needs_ip().
I'm tempted to suggest that clearing out conn_ire_cache is not done in tcp_closei_local(), but maybe that is just moving around the deck chairs and not fixing the problem.
Othersise, it would seem unavoidable to need to get conn_lock in tcp_fuse_output() so that we can do ire_to_ill() on the cached ire and get the ill_index, dropping conn_lock after doing so and before calling pfhooks.
The current issue is not about tcp_closei_local() clearing
conn_ire_cache. This should not be a problem since the
fusion should be already torn apart at that time. The issue
is that outside TCP, the conn_ire_cache can be cleared,
as in the case in this CR.
So one way to approach this would be to lock the conn_t early in tcp_fuse_output(), store conn_ire_cache in a local pointer, increment the reference count on the ire (if it is non-NULL) and then release the conn_t lock.
The catch is what to do if the ire is NULL when we get to this point - it seems like the most obvious condition that causes this race is one end closing up shop.
Ignore the comments I made earlier, the obvious thing to do, in that case, is punt the processing back to unfused.
Note that it isn't sufficient to "just add another check" of the conn_ire_cache pointer, it must be copied and held using IRE_REFHOLD.
e.g. maybe this is the last ire_t associated with an ill_t that is being removed...
|