Category
kernel
Sub-Category
network-driver
Description
During a large file transfer, a card using the RGE driver drops off
the network. Its not related to the hwchecksum bug (I've tried with and
without that option in /etc/system) On 106 it happens after 25-30 gigs,
on 101 (2008.11) it happened between 10 and 15 gb transferred. Snoop
shows only the arp requests being sent (with no reply.) I can bring the
card back online by unplumb/plumb, but the transfer becomes
signifigantly slower than it was originally. If I unplug and plug the
network cable, I see the link down link up messages in dmesg, but it has
no effect on traffic flowing. Other network cards in the machine
continue to work fine when this occurs.
Frequency
Always
Regression
Solaris 10
Steps to Reproduce
1) copy 30gb of data to a nfs shared zfs backended share using a rge
card
2) wait for the machine to loose connection
Expected Result
the network card shouldn't drop off the network
Actual Result
the network card drops off the network
Error Message(s)
Test Case
Workaround
Additional configuration information
sum /kernel/drv/amd64/rge
4707 200 /kernel/drv/amd64/rge
modinfo | grep rge
163 fffffffff7ea9000 a9d8 110 1 rge (Realtek 1Gb Ethernet)
Asus P5QL motherboard with builtin Realtek card
pci bus 0x0002 cardnum 0x00 function 0x00: vendor 0x10ec device 0x8168
Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit
Ethernet controller
Work Around
N/A
Comments
From Masayuki Murayama:
The latest rge will be below, which I made to test the fix for 6892693.
Would you try it first? Please ensure if the performance is not degraded too.
http://homepage2.nifty.com/mrym3/taiyodo/rge.mcast.3.tar.gz
To load the new rge driver into kernel:
(1) unload existing rge:
unplumb rge port
# ifconfig rge0 unplumb
find module id of rge
# modinfo | grep rge
if the result is:
200 fffffffff889e000 d420 320 1 rge (Realtek 1Gb Ethernet)
then,
# modunload -i 200
(2) load the new rge
if you use 64bit kernel:
# modload ./amd64/rge
if you use 32bit kernel:
# modload ./i386/rge
ensure the new rge is loaded and running
# modinfo | grep rge
200 fffffffff889e000 d420 320 1 rge (Realtek 1Gb Ethernet mcast.3)
(3) plumb the new rge
# ifconfig rge0 plumb .......
(4) then test your applications.
Anders added the following comment to the openSolaris Bugzilla bug:
"Please update 6807184 in bugster with the following information:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6807184
> > The latest rge will be below, which I made to test the fix for 6892693.
> >
> >Would you try it first? Please ensure if the performance is not degraded too.
> >http://homepage2.nifty.com/mrym3/taiyodo/rge.mcast.3.tar.gz
I've verified that the bug occurs in rge from opensolaris 0906 and does no
longer occur in your rge, at least on my osol 0906 box.
Verification of "exists": I've created a zvol with shareiscsi=on, used an Apple
iMac (running 10.6.2) as iscsi initiator and used Helios LanTest using the
storage to perform some small disk benchmarks onto the share, including writing
and reading a 3 GB file from the share. During the third loop of this
benchmark, the transfers stalled.
Verification using the new rge driver: Same as aboved, but the benchmark loop
has been running for a couple of hours without any problems. The benchmark has
been running 40 times for now, so I assume that the issue has been fixed in the
new rge driver.
My testing equipment is limited; I'm seeing a constant transfer rate of 47 MB/s
for read and 26 MB/s for write benchmarks on that specific link between my two
boxes, so I don't see much issues yet and don't complain for around 400 Mbit on
a 7 euro NIC :-)
Regards,
Anders"