OpenSolaris

Printable Version Enter a New Search
Bug ID 6317553
Synopsis Wrong fix implemented in 4877168 for dmfe Rx buffer unavailable messages.
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:driver-dmfe
Keywords 128 | RX_UNAVAIL_INT | Rx | buffers | chip_error_interrupt | dmfe | dmfe:dmfe_rx_desc | factotum_recover | onnv_triage | unavailable
Responsible Engineer Garrett Damore
Reported Against
Duplicate Of
Introduced In solaris_9
Commit to Fix snv_76
Fixed In snv_76
Release Fixed solaris_nevada(snv_76)
Related Bugs 4877168
Submit Date 30-August-2005
Last Update Date 29-October-2007
Description
Ericssons uses massive numers of CP2300s in GPRS infrastructure systems.

They have reported unusual high loss of network packets during load tests.
We have investigated this and come to the conclusion that the fix implemented in bug
4877168 does not address the source of the problem. It resets the chip and filter out 
the messages which just adds to the problem.

The core problem is that the chip runs out of ring buffers, and raises the RX_UNAVAIL_INT
interrupt. The number of ring buffers available by default is 32, and are tuneable 
(dmfe:dmfe_rx_desc) up to 256 buffers.

We believe that the cause of the problem is a too low default value of the ring buffers,
it should be set at 64 or 128, not 32. We have performed tests where 128 buffers was sufficient at ~9000/sec incoming 1000 bytes size UDP packets. We lost not one packet during 12 hours, compared to 2300 lost packets and 64 chip resets in 20 minutes when using the default 32 ring buffers.

The warning message "Rx buffer unavailable" should NOT be quiesced. Now the only
way to get a clue that something is unusual, is by detecting the kstat vars 
"chip_error_interrupt" and "factotum_recover" increment, no messages logged in /var/adm/messages and the knowledge that the messages for RX_UNAVAIL_INT is the only type of messages omitted for abnormal interrupts.

I see no evidence in the 4877168 that the numbers of ring buffers have been considered as a factor in the problem at hand.

Our customer rely primarily on Solaris 8, but are moving to Solaris 9 in the near future.

See Comments field as well ....
Work Around
Reset the driver (unplumb/plumb)
Comments
N/A