|
Description
|
Ericssons uses massive numers of CP2300s in GPRS infrastructure systems.
They have reported unusual high loss of network packets during load tests.
We have investigated this and come to the conclusion that the fix implemented in bug
4877168 does not address the source of the problem. It resets the chip and filter out
the messages which just adds to the problem.
The core problem is that the chip runs out of ring buffers, and raises the RX_UNAVAIL_INT
interrupt. The number of ring buffers available by default is 32, and are tuneable
(dmfe:dmfe_rx_desc) up to 256 buffers.
We believe that the cause of the problem is a too low default value of the ring buffers,
it should be set at 64 or 128, not 32. We have performed tests where 128 buffers was sufficient at ~9000/sec incoming 1000 bytes size UDP packets. We lost not one packet during 12 hours, compared to 2300 lost packets and 64 chip resets in 20 minutes when using the default 32 ring buffers.
The warning message "Rx buffer unavailable" should NOT be quiesced. Now the only
way to get a clue that something is unusual, is by detecting the kstat vars
"chip_error_interrupt" and "factotum_recover" increment, no messages logged in /var/adm/messages and the knowledge that the messages for RX_UNAVAIL_INT is the only type of messages omitted for abnormal interrupts.
I see no evidence in the 4877168 that the numbers of ring buffers have been considered as a factor in the problem at hand.
Our customer rely primarily on Solaris 8, but are moving to Solaris 9 in the near future.
See Comments field as well ....
|