|
Description
|
We hit an interesting panic today when running the IPMP test suite:
> ::status
debugging crash dump vmcore.3 (64-bit) from earthnews
operating system: 5.11 clearview-ipmp-build:12/16/08 (sun4u)
panic message:
assertion failed: connp->conn_ilg == NULL, file:
../../common/inet/ip/ipclassifier.c, line: 2286
dump content: kernel pages only
Indeed, we have a conn_t with a valid conn_ilg:
> $c
vpanic(12f4da0, 7bb6a090, 7bb69400, 8ee, 600303231e0, 600303231e0)
assfail+0x74(7bb6a090, 7bb69400, 8ee, 1854c00, 12f4c00, 0)
ipcl_conn_cleanup+0x128(300f65cb700, 3, 0, 300f65cb700, 3004f5cda10,
ipcl_conn_destroy+0x460(300f65cb700, 300f65cb700, 300f65cbb80, 7bb69400,
rawip_do_close+0xe0(300f65cb700, 1, 80001150, 1000, 0, 1)
rawip_close+4(300f65cb700, 3, 60030cd3e70, 7bf8c000, 0, 0)
so_close+0x1c(3007dd65a58, 3, 60030cd3e70, 3007dd65a78, 3007dd65a58,
fop_close+0x48(306b9541100, 3, 1, 0, 60030cd3e70, 0)
closef+0xa4(600305ef060, 30058025880, 6003006e2a0, 1800c1515b9, 12eb3d0,
closeandsetf+0x41c(8, 0, 600305ef060, 180c000, 60030df24d0, 200)
close+8(8, 0, ffbfa198, 68bc0, 1010101, 0)
syscall_trap32+0x1e8(8, 0, ffbfa198, 68bc0, 1010101, 0)
> 300f65cb700::print conn_t conn_ilg conn_ilg_inuse
conn_ilg = 0x6003090ef08
conn_ilg_inuse = 0x1
This ilg was likely very recently allocated, as the thread that did the
allocation is still busy draining its IPSQ via ip_rput_process_nondata()
-> ipsq_exit():
> 0x6003090ef08::whatis -b
6003090ef08 is 6003090ef00+8, bufctl 3008ebaa298 allocated from
kmem_alloc_896
> 3008ebaa298$<bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
3008ebaa298 6003090ef00 39df4701c27c 2a101295ca0
3000004f3a0 3000db85f00 300434bfb40
kmem_cache_alloc+0x90
kmem_alloc+0x2c
mi_alloc+0xc
mi_zalloc+8
conn_ilg_alloc+0x58
ilg_add_v6+0x2b0
ip_opt_add_group_v6+0x2bc
ip_opt_set+0x1344
svr4_optcom_req+0x660
ip_restart_optmgmt+0x84
ipsq_drain+0x11c
ipsq_exit+0x80
ip_rput_process_notdata+0x1b4
ip_input+0x220
putnext+0x390
> 2a101295ca0::findstack
stack pointer for thread 2a101295ca0: 2a101293461
[ 000002a101293461 panic_idle+0x1c() ]
000002a101293511 ktl0+0x48()
000002a101293661 mac_flow_lookup+0xd0()
000002a101293771 mac_tx_classify+0x14()
000002a101293831 mac_tx_send+0x368()
000002a101293931 mac_tx_single_ring_mode+0x108()
000002a101293a01 mac_tx+0x364()
000002a101293ac1 str_mdata_fastpath_put+0xb0()
000002a101293b71 dld_wput+0xe0()
000002a101293c21 putnext+0x390()
000002a101293cd1 hbwputmod+0x90()
000002a101293d81 putnext+0x390()
000002a101293e31 ip_xmit_v4+0x3dc()
000002a101293ef1 ip_wput_ire+0x2cc8()
000002a101294341 igmpv3_sendrpt+0x410()
000002a101294461 igmp_joingroup+0x174()
000002a101294511 ip_addmulti+0x214()
000002a1012945d1 ilg_add+0x3b4()
000002a1012946a1 ip_opt_add_group+0x154()
000002a101294781 ip_opt_set+0x9e0()
000002a1012948c1 svr4_optcom_req+0x660()
000002a1012949a1 ip_restart_optmgmt+0x84()
000002a101294a51 ipsq_drain+0x11c()
000002a101294b01 ipsq_exit+0x80()
000002a101294bb1 ip_rput_process_notdata+0x1b4()
000002a101294c61 ip_input+0x220()
000002a101294e41 putnext+0x390()
000002a101294ef1 hbrput+0x164()
000002a101294fa1 putnext+0x390()
000002a101295051 proto_disabmulti_req+0x11c()
000002a101295111 dld_wput_nondata_task+0x74()
000002a1012951c1 taskq_d_thread+0xbc()
000002a101295291 thread_start+4()
Looking at the code, there seems to be an issue with CONN_CLOSING.
In particular, the comment above CONN_CLOSING in ip_quiesce_conn()
states that conn_ilg shouldn't change after it's set:
/*
* Mark the conn as closing, and this conn must not be
* inserted in future into any list. Eg. conn_drain_insert(),
* won't insert this conn into the conn_drain_list.
* Similarly ill_pending_mp_add() will not add any mp to
* the pending mp list, after this conn has started closing.
*
--> * conn_idl, conn_pending_ill, conn_down_pending_ill, conn_ilg
* cannot get set henceforth.
*/
However, I don't see any code to enforce this in the conn_ilg_alloc()
path. Without this check, conn_ilg can still be set by another thread
(e.g., a taskq thread as shown above) after ip_quiesce_conn() has set
ilg_cleanup_reqd but before the refcounts drop at the end of
ip_quiesce_conn(). It will then panic as shown above.
Crash dump is at /net/mdb.eng/cores/meem/ilg/*.3
|