OpenSolaris

Printable Version Enter a New Search
Bug ID 6637163
Synopsis ip_rput_fragment[_v6]() spuriously prunes valid frags due to unbounded inaccuracy of ill_frag_count
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords rtiq_reviewed | spbc_s10uX
Responsible Engineer George Shepherd
Reported Against s10u4_fcs , solaris_10u3 , solaris_10u4 , solaris_10u5
Duplicate Of
Introduced In solaris_10
Commit to Fix s10u6_05
Fixed In s10u6_05
Release Fixed solaris_10u6(s10u6_05) , solaris_nevada(snv_92) (Bug ID:2156724)
Related Bugs 6534479 , 6694819
Submit Date 4-December-2007
Last Update Date 16-July-2008
Description
Despite all fragments of packets arriving in a timely fashion across the cluster private
interconnect, ip reassembly is failing, pruning the fragmentation list erroneously when
ill->ill_frag_count underruns due to read/modify write races between the various threads
which update it across multiple ill_frag_hash_tbl buckets.
The underlying syndrome is described in CR 6534479.

At the moment ill->ill_frag_count is a best efforts approximation, but underruns cannot
be allowed to call ill_frag_prune(), or it will cost us one or more packets for which 
the fragments are all fully available and ready for reassembly. On a busy Oracle RAC cluster 
interconnect these underruns are EXTREMELY regular and Oracle detects  "lost blocks" which it 
must try to recover.
(Oracle RAC uses UDP and performs timer based recovery) severely impacting the transaction 
performance on SunCluster). The need to recover the occasional packet due 
to a checksum error is understood and is rare enough on the interconnect not to be 
a significant performance penalty (modulo bad network hardware).
Work Around
None.
Comments
N/A