OpenSolaris

Printable Version Enter a New Search
Bug ID 6549950
Synopsis in.mpathd's run_timeouts() logic introduces perceived probe jitter
State 10-Fix Delivered (Fix available in build)
Category:Subcategory network:ipmp
Keywords clearview
Responsible Engineer Peter Memishian
Reported Against
Duplicate Of
Introduced In solaris_9
Commit to Fix snv_71
Fixed In snv_71
Release Fixed solaris_nevada(snv_71)
Related Bugs
Submit Date 25-April-2007
Last Update Date 15-August-2007
Description
While working on the IPMP Rearchitecture, I noticed something odd during
ipmpstat bringup: the roughly every 20 seconds, a probe's RTT would be
delayed by about 5ms:

     # ipmpstat -o time,probe,target,rtt -p
     TIME        PROBE     TARGET              RTT    
     1.21s       125       10.8.57.248         0.79ms 
     2.94s       126       10.8.57.248         0.81ms 
     4.16s       127       10.8.57.248         1.19ms 
     5.89s       128       10.8.57.248         0.79ms 
     7.34s       129       10.8.57.248         0.82ms 
     8.44s       130       10.8.57.248         0.80ms 
     9.74s       131       10.8.57.248         0.75ms 
     11.19s      132       10.8.57.248         0.84ms 
     12.69s      133       10.8.57.248         0.78ms 
     14.50s      134       10.8.57.248         0.78ms 
     15.56s      135       10.8.57.248         0.75ms 
     17.33s      136       10.8.57.248         0.75ms 
     18.95s      137       10.8.57.248         0.77ms 
-->  20.08s      138       10.8.57.248         6.43ms 
     21.44s      139       10.8.57.248         0.79ms 
     23.04s      140       10.8.57.248         0.79ms 
     24.48s      141       10.8.57.248         0.77ms 
     25.72s      142       10.8.57.248         0.78ms 
     26.99s      143       10.8.57.248         0.77ms 
     28.76s      144       10.8.57.248         0.80ms 
     29.75s      145       10.8.57.248         0.77ms 
     31.45s      146       10.8.57.248         0.79ms 
     33.24s      147       10.8.57.248         0.80ms 
     34.33s      148       10.8.57.248         0.80ms 
     35.88s      149       10.8.57.248         0.81ms 
     37.26s      150       10.8.57.248         0.83ms 
     38.39s      151       10.8.57.248         0.76ms 
-->  40.31s      152       10.8.57.248         6.45ms 
     41.25s      153       10.8.57.248         0.79ms

Some rummaging around revealed that it's in.mpathd *itself* that's
responsible for this.  Specifically, here in run_timeouts():


        for (pii = phyint_instances; pii != NULL; pii = next_pii) {
                next_pii = pii->pii_next;
(1) -->         next_event_time = phyint_inst_timer(pii);
                if (next_event_time != TIMER_INFINITY && next_event_time < next)
                        next = next_event_time;

                if (debug & D_TIMER) {
                        logdebug("run_timeouts(%s %s): next scheduled for"
                            " this phyint inst %u, next scheduled global"
                            " %u ms\n",
                            AF_STR(pii->pii_af), pii->pii_phyint->pi_name,
                            next_event_time, next);
                }
        }

        /*
         * Make sure initifs() is called at least once every
         * IF_SCAN_INTERVAL, to make sure that we are in sync
         * with the kernel, in case we have missed any routing
         * socket messages.
         */
        if (next > IF_SCAN_INTERVAL)
                next = IF_SCAN_INTERVAL;

(2) --> if ((getcurrenttime() - last_initifs_time) > IF_SCAN_INTERVAL) {
                initifs();
                check_config();
        }

The call to phyint_inst_timer() at (1) is what sends out a probe; along
the way the time of the probe is recorded.  Before we've had a chance
to see the reply to that probe, we execute (2), which is true every
20 (IF_SCAN_INTERVAL) seconds.  In that case, we call initifs() and
check_config() both of which (apparently) combine to take 5 milliseconds.

While we could certainly look into speeding up initifs() and
check_config(), the jitter can be eliminated simply by hoisting the block
of code at (2) to before the for() loop.

Indeed, with that change, the problem goes away entirely.
Work Around
N/A
Comments
N/A