2 issues to report :
1-
We use to be able to get more connections between an NFS client as a server
by setting clnt_max_conns (default 1). This does not work anymore because the compiler
seeing clnt_max_conns declared as a static, will insert the value directly in the assembler
of connmgr_get() routine. Remove the static from the declaration would suffice.
2-
The motivation for more connections was also to get the server to send back data
on a spread of the N connections in order to get multiple TX rings involved
in the data transfer. NXGE Atlas cards has 12 such rings and using them leads to higher throughput and less tx ring lock contention. Unfortunately, even after patching the kernel to avoid the above problem, data did not spred well to the rings. I believe the reason is that the
request are themselves going out on the different transport in a very unbalanced way.
root@ar02(18): dtrace -n 'connmgr_get:return{@a[((struct cm_xprt *)arg1)]=count()}'
dtrace: description 'connmgr_get:return' matched 1 probe
-1070625480512 165
-1070614271104 165
-1051728159744 165
-1077868296704 166
-1070634549888 166
-1070614271872 166
-1070614270208 166
-1051725462912 166
-1051725373312 166
-1077787468480 3765
Even though the server is free to return data on different connections, that does not seems to be the case, and the request imbalance above leads to a ring imbalance on the server :
ar01# dtrace -n 'svc_getreq:entry{@a[args[0]->xp_xpc.xpc_wq]=count()}'
dtrace: description 'svc_getreq:entry' matched 1 probe
-953024756048 1
-953893826976 214
-950056602664 214
-935127940120 214
-935123168376 214
-935127937496 215
-935125997312 215
-935124818984 215
-935124816360 215
-935124072216 215
-951385786136 4832
ar01#
And the ring distribution for responses (this is an nfs read test).
ar01# dtrace -n 'nxge_start:entry{@a[arg1]=count()}'
dtrace: description 'nxge_start:entry' matched 1 probe
-954080784384 4008
-953994195968 4008
-953546306432 4008
-953747842048 4009
-953502126912 4010
-953747843584 4011
-953546301376 8022
-953546300608 88074
Checking out connmgr_get() I think the reason is this bit of code :
while ((cm_entry = *cmp) != NULL) {
...
if (cm_entry->x_time - prev_time <= 0 ||
lru_entry == NULL) {
prev_time = cm_entry->x_time;
lru_entry = cm_entry;
}
}
Where we walk all connections looking for the LRU one. The x_time is set to lbolt
when a connection is used and the connection put at the head of the list; When lbolt revs up,
then each cm_entry will be used in round-robin fashion because their x_time will be < lbolt and the loop selects the last such entry.
But after that we will have for every connection(cm_entry->x_time == prev_time == lbolt).
For the rest of the tick, the first entry will be systematically returned.
Spreading data destined to a single client on multiple server TX ring is thus not possible because of this client side code.