OpenSolaris

Printable Version Enter a New Search
Bug ID 6586787
Synopsis TCP can end up passing an LSO packet to a non-LSO driver; panic ensues
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords solx86-nic-nicdrv-xge
Responsible Engineer Roamer Lu
Reported Against fw_55 , latest , snv_69 , snv_94 , snv_106
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_111
Fixed In snv_111
Release Fixed solaris_nevada(snv_111) , solaris_10u8(s10u8_03) (Bug ID:2174729)
Related Bugs 6394197 , 6575487 , 6588872 , 6759679 , 6816228 , 6826384
Submit Date 30-July-2007
Last Update Date 4-April-2009
Description
S10 update4 build12 system panic'ed when running load_unload test against xge interface on a pair of V40z systems.
 
-bash-3.00# uname -a
SunOS waxe 5.10 Generic_120012-13 i86pc i386 i86pc
-bash-3.00# cat /etc/release
                        Solaris 10 8/07 s10x_u4wos_12 X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 24 July 2007
-bash-3.00#panic[cpu0]/thread=fffffe80001a9c80:
BAD TRAP: type=e (#pf Page fault) rp=fffffe80001a94e0 addr=ffffffff900dd000


sched:
#pf Page fault
Bad kernel fault at addr=0xffffffff900dd000
pid=0, pc=0xfffffffffb829b1a, sp=0xfffffe80001a95d8, eflags=0x10213
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse>
cr2: ffffffff900dd000 cr3: 12c51000 cr8: c
        rdi: ffffffff900dcffa rsi: fffffe8f4b594c68 rdx:              5b4
        rcx:               46  r8:              5a8  r9:               26
        rax: fffffe8f4b594e9c rbx:              5b4 rbp: fffffe80001a9610
        r10:                0 r11:                1 r12: fffffe86073223a0
        r13:              5b4 r14: ffffffff900dd22e r15: ffffffff908f8b78
        fsb: ffffffff80000000 gsb: fffffffffbc25460  ds:               43
         es:               43  fs:                0  gs:              1c3
        trp:                e err:                2 rip: fffffffffb829b1a
         cs:               28 rfl:            10213 rsp: fffffe80001a95d8
         ss:               30

fffffe80001a93f0 unix:real_mode_end+7051 ()
fffffe80001a94d0 unix:trap+d86 ()
fffffe80001a94e0 unix:cmntrap+13f ()
fffffe80001a9610 unix:bcopy+a ()
fffffe80001a9670 bge:bge_send+4e ()
fffffe80001a96a0 bge:bge_m_tx+8d ()
fffffe80001a96b0 dls:dls_tx+e ()
fffffe80001a96d0 dld:dld_tx_single+1f ()
fffffe80001a96f0 dld:str_mdata_fastpath_put+40 ()
fffffe80001a9780 ip:tcp_lsosend_data+350 ()
fffffe80001a9840 ip:tcp_send+5f5 ()
fffffe80001a9900 ip:tcp_wput_data+471 ()
fffffe80001a9a90 ip:tcp_rput_data+133e ()
fffffe80001a9ad0 ip:squeue_enter_chain+16e ()
fffffe80001a9bd0 ip:ip_input+b20 ()
fffffe80001a9c10 dls:soft_ring_drain+98 ()
fffffe80001a9c60 dls:soft_ring_worker+db ()
fffffe80001a9c70 unix:thread_start+8 ()


panic[cpu0]/thread=fffffe80001a9c80:
BAD TRAP: type=e (#pf Page fault) rp=fffffffffbc4bde0 addr=0 occurred in module
"<unknown>" due to a NULL pointer dereference

syncing file systems...
 done
dumping to /dev/dsk/c1t0d0s1, offset 859111424, content: kernel>
>> $c
bcopy+0xa()
bge_send+0x4e()
bge_m_tx+0x8d()
dls_tx+0xe()
dld_tx_single+0x1f()
str_mdata_fastpath_put+0x40()
tcp_lsosend_data+0x350()
tcp_send+0x5f5()
tcp_wput_data+0x471()
tcp_rput_data+0x133e()
squeue_enter_chain+0x16e()
ip_input+0xb20()
soft_ring_drain+0x98()
soft_ring_worker+0xdb()
thread_start+8()
 
> ::system
set ip_squeue_soft_ring=0x1 [0t1]
set kmem_flags=0xf [0t15]>
 
I do not run the same test on Nevada due to lack of test environment. It's interesting that tcp_lsosend_data() was called when sending packets through bge interface for bge does not support LSO so far. 
 
Please see the coredumps at /net/greatwall.prc/users/xw161283/coredump/CR6586787.
Please run the same test on Nevada and file a bug if it's also there. The bug exists in both gates need to be fixed in Nevada first.
Work Around
N/A
Comments
N/A