OpenSolaris

Printable Version Enter a New Search
Bug ID 6840063
Synopsis usbsacm stops sending data out when pushed hard
State 10-Fix Delivered (Fix available in build)
Category:Subcategory driver:usbsacm
Keywords opensolaris
Responsible Engineer Raymond Chen
Reported Against snv_90 , snv_111
Duplicate Of
Introduced In solaris_10
Commit to Fix snv_120
Fixed In snv_120
Release Fixed solaris_nevada(snv_120)
Related Bugs 6588968 , 6719062
Submit Date 12-May-2009
Last Update Date 31-July-2009
Description
Category
   solaris/ppp (Solaris Networking)
Sub-Category
   ppp_kernel
Description
   After connecting using ppp, sppp0 is established, and works to a point. I can ping, and do some generally low bandwidth things, like ssh or telnet. I can even load some web pages.
But - as soon as I attempt to load a number of pages at the one time, the ppp interface stops working.
In addition to this, once it's stopped working, an ifconfig -a will hang part way through displaying the sppp0 interface entry.
I have verified using an external host that when the issue occurs, we are indeed no longer sending packets out. 
For what it's worth, I'm more than happy to collect whatever data you might like me to collect.
Frequency
   Always
Regression
   No
Steps to Reproduce
   Start ppp interface, connect to theregister.co.uk and 'open in new tab' a bunch of the stories. by the time you have started opening the 6th tab, it'll be stopped.
 If there is an already running snoop, it can been seen that there are still requests going out from snoops point if view, but no return packets.
Expected Result
   One might expect that it would actually keep working and display the requested pages.
Actual Result
   sppp0 interface stops working, thus, webpages etc all stop responding. 
To restart the interface, one needs to pkill pppd (at times with a -9), possibly clean up interfaces with ifconfig unplumb then re-connect. 
Error Message(s)
   No error messages as such - Things just stop progressing. 
Test Case
   N/A
Workaround
   
Additional configuration information
   Opensolaris 2008.11, updated to kernel 111a using pkg image-update from pkg.opensolaris.org/dev
Nokia 6310 USB connected phone
Phone, cable and service provider demonstrated to work fine on a different laptop running Nevada 110.
I thought it might have been a hardware issue, however, I first encountered this issue on an HP tx2-1015au, which is ATI chipset, AMD cpu, and have since encountered it on an HP (compaq) 2510p, which is an all intel affair.
More oddly, it seems to work just fine in real nevada (sxce) build 110 on an MSI Megabook S270, but not in OpenSolaris on either of my HP units.
PPP is stuck because USB is stuck.  Looking at the stream in the
attached system dump, I see this:

> 0xffffff014f8ef140::queue
            ADDR MODULE         FLAGS NBLK
ffffff014f8ef140 usbsacm       244020  449 ffffff015dec6700

There are hundreds of messages on the queue, and nothing is moving.
Looking at the flags and transmit state on the driver, I see this:

> 0xffffff0154266900::print usbser_port_t port_state port_flags port_wq_data_cnt port_wq_thread port_flowc
port_state = 0x4
port_flags = 0x44
port_wq_data_cnt = 0xfc78
{
    port_wq_thread.thr_cv = {
        _opaque = 0x1
    }
    port_wq_thread.thr_flags = 0x1
    port_wq_thread.thr_port = 0xffffff0154266900
    port_wq_thread.thr_func = usbser_wq_thread
    port_wq_thread.thr_arg = 0xffffff0154266958
}
port_flowc = 0

In other words, the port is open, no flow control has been asserted,
and the write queue thread is just sleeping away.  I have to guess
that either the hardware itself is broken and has simply stopped
transmitting, or that the complex thread wakeup logic in usbser.c
("GSD") is broken.

I suspect the latter, as there've been *many* reports now of people
with stuck USB serial ports, and it seems unlikely that lots of people
all have defective gear.

Plus, the same hardware is known to work on other systems.

Thus, transferring over to USB team to investigate.
Work Around
There's report that if mulit-core is disabled(see psradm(1M)), there will be no hang. I don't have chance to verify it. But this could be a choice before the root cause is found.
Comments
Transferring to ppp/ppp_kernel initially for investigation.  I
suspect, though, that this is another USB-related problem, as
we've seen several reports of USB trouble.

I sent a request to the submitter for more information on the
system (kernel) state at the time of the hangs.
From the coredump, we can see that the usbsacm w_queue is pending on a condition variable, which is the write queue thread's cv, thr->thr_cv:
ffffff0004abfc60 fffffffffbc2c030                0   0  60 ffffff0154266958
  PC: _resume_from_idle+0xf1    TASKQ: usbsacm_usbser_taskq
  stack pointer for thread ffffff0004abfc60: ffffff0004abfb30
  [ ffffff0004abfb30 _resume_from_idle+0xf1() ]
    swtch+0x147()
    cv_wait+0x61()
    usbser_wq_thread+0x64()
    taskq_thread+0x193()
    thread_start+8()
The thr_flags is 0x01, THR_RUNNING and no USBSER_THR_WAKE flag set, which causes the usbser_wq_thread() to step into the cv_wait branch. The usbser_thr_wake() is called in various places to set USBSER_THR_WAKE flag. usbser_rsrv, usbser_wsrv, usbser_tx_cb, usbser_status_cb, usbser_restart or usbser_restore_ports_state will call this function to trigger sending signal to the pending cv.

Both usbser_wq_thread and usbser_rq_thread could be waken to do further process and hence clear USBSER_THR_WAKE flag. There seems to be some competition.
Submitter confirms that disabling a core like this:

  psradm -f 1

... will work around the problem.  Reenabling the core causes
the link to choke.
usbser_wq_thread() is used to serve write queue messages. This thread is blocked for new messages, data transmit completion, status callback or break/delay expiration. That is, these funcations, usbser_wsrv, usbser_tx_cb, usbser_status_cb and usbser_restart, can wake up usbser_wq_thread.
1).usbsacm doesn't use status callback. Hence, it has nothing to do with usbser_status_cb here.
2).usbser_wsrv can always send a wake up signal as long as the port is still open.
3).TCSBRK ioctl or M_BREAK msg can cause the usbsacm send break request to device, if that device support break. If successful, usbser_restart is called after a while to wake up usbser_wq_thread.
4). usbser_tx_cb is called by DSD after completion of a message. As long as device is still online and port is open and not busy(doing tx,break,delay or ctrl), it will wake up the usbser_wq_thread.
It can be seen from the coredump that the device is online, port is open(port_state = 0x04) and not busy(port_act = 0). There seems no reason the usbser_wq_thread is blocked on a cv.
Every DSD has its own queue to hold data messages from GSD.
> 0xffffff014f872c80::print usbsacm_port_t acm_rx_mp acm_tx_mp acm_bulkin_state acm_bulkout_stat
e acm_mctlout acm_mctlin
acm_rx_mp = 0 <-- No data pending on rx queue
acm_tx_mp = 0 <-- No data pending on tx queue
acm_bulkin_state = 0x2  <-- busy
acm_bulkout_state = 0x1 <-- idle
acm_mctlout = 0x3 <-- USB_CDC_ACM_CONTROL_DTR & USB_CDC_ACM_CONTROL_RTS
acm_mctlin = 0 

The usbsacm still functions normally. No data messages is pending on it. All the messages are pending at the GSD's queue.
Copied from my eamil to code reviewers:

-------- ¿¿¿¿ --------
¿¿: 	code review for CR6839931,6840438 and 6840063
¿¿: 	Wed, 27 May 2009 14:49:53 +0800
¿¿¿: 	Lei Chen
¿¿¿: 	usb-team


Hi team,
       Please help me review the fix for CR6839931 and 6840438.
       In the webrev, I also made a preliminary change to usbser.c to fix CR6840063. I'm still not sure of the root cause. But I think usbser should wake up the wq_thread whenever the GSD is not doing any TX or control operation. The GSD should not check its RX status because this could block further TX. Besides, the current implementation wakes up tx_thread at an unknown time by usbser_wsrv when a mblk is sent from upper module. I changed it to wake up the thread immediately after a mblk is put on write queue. I have sent the fix to submitter to verify.  Please correct me if you think the above logic is not suitable.

http://greatwall.prc/~lc152243/CR6839931_6840438/
http://greatwall.prc/~lc152243/CR6840438_closed/


Thanks,
Lei Chen
From the attached log, we can see it proves my guess that the usbser_wq_thread is not waked up because of the USBSER_ACT_RX is set in pp->port_act, i.e.:

line 5067-- usbs[0].0:      usbser_tx_cb: act=3 curthread=ffffff000fd03c60

The set of USBSER_ACT_RX bit prevents usbser_wq_thread from waking up. After this line, no usbser_tx_cb is called anymore. This means no data get sent out after this line though the messages continue to be pushed from upper modules. Only when the usbser_wsrv is called, the usbser_wq_thread wakes up again.