OpenSolaris

Printable Version Enter a New Search
Bug ID 6696163
Synopsis clnt_max_conns can't be set on amd64
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:rpc
Keywords oss-request
Sponsor
Submitter slink
Responsible Engineer Dai Ngo
Reported Against s10
Duplicate Of
Introduced In solaris_2.4
Commit to Fix snv_117
Fixed In snv_117
Release Fixed solaris_nevada(snv_117) , solaris_10u8(s10u8_07) (Bug ID:2180293)
Related Bugs 6739879 , 6817942 , 6836457
Submit Date 30-April-2008
Last Update Date 17-June-2009
Description
2 issues to report :

1-
We use to be able to get more connections between an NFS client as a server
by setting clnt_max_conns (default 1). This does not work anymore because the compiler 
seeing clnt_max_conns declared as a static, will insert the value directly in the assembler
of connmgr_get() routine. Remove the static from the declaration would suffice.

2-
The motivation for more connections was also to get the server to send back data 
on a spread of the N connections in order to get multiple TX rings involved 
in the data transfer.  NXGE Atlas cards has 12 such rings and using them leads to higher throughput and less tx ring lock contention. Unfortunately, even after patching the kernel to avoid the above problem, data did not spred well to the rings. I believe the reason is that the
request are themselves going out on the different transport in a very unbalanced way.

root@ar02(18): dtrace -n 'connmgr_get:return{@a[((struct cm_xprt *)arg1)]=count()}'
dtrace: description 'connmgr_get:return' matched 1 probe


   -1070625480512              165
   -1070614271104              165
   -1051728159744              165
   -1077868296704              166
   -1070634549888              166
   -1070614271872              166
   -1070614270208              166
   -1051725462912              166
   -1051725373312              166
   -1077787468480             3765

Even though the server is free to return data on different connections, that does not seems to be the case, and the request imbalance above leads to a ring imbalance on the server : 

ar01# dtrace -n 'svc_getreq:entry{@a[args[0]->xp_xpc.xpc_wq]=count()}'
dtrace: description 'svc_getreq:entry' matched 1 probe


    -953024756048                1
    -953893826976              214
    -950056602664              214
    -935127940120              214
    -935123168376              214
    -935127937496              215
    -935125997312              215
    -935124818984              215
    -935124816360              215
    -935124072216              215
    -951385786136             4832
ar01# 

And the ring distribution for responses (this is an nfs read test).

ar01# dtrace -n 'nxge_start:entry{@a[arg1]=count()}'
dtrace: description 'nxge_start:entry' matched 1 probe


    -954080784384             4008
    -953994195968             4008
    -953546306432             4008
    -953747842048             4009
    -953502126912             4010
    -953747843584             4011
    -953546301376             8022
    -953546300608            88074


Checking out connmgr_get() I think the reason is this bit of code :

		while ((cm_entry = *cmp) != NULL) {
	...
				if (cm_entry->x_time - prev_time <= 0 ||
				    lru_entry == NULL) {
					prev_time = cm_entry->x_time;
					lru_entry = cm_entry;
				}
		}

Where we walk all connections looking for the LRU one. The x_time is set to lbolt
when a connection is used and the connection put at the head of the list; When lbolt revs up, 
then each cm_entry will be used in round-robin fashion because their x_time will be < lbolt and the loop selects the last such entry.
But after that we will have for every connection(cm_entry->x_time == prev_time == lbolt). 
For the rest of the tick, the first entry will be systematically returned.


Spreading data destined to a single client on multiple server TX ring is thus not possible because of this client side code.
Work Around
N/A
Comments
N/A