Solaris performs some accounting and bookkeeping activities every
clock tick. To do this, a cyclic timer is created to go off every
clock tick and call a clock handler (clock()). This handler performs,
among other things, tick accounting for active threads.
Every tick, the tick accounting code in clock() goes around all the
active CPUs in the system, determines if any user thread is running
on a CPU and charges it with one tick. This is used to measure the
number of ticks a user thread is using of CPU time. This also goes
towards the time quantum used by a thread. Dispatching decisions are
made using this. Finally, the LWP interval timers (virtual and profiling
timers) are processed every tick, if they have been set.
As the number of CPUs increases, the tick accounting loop gets larger.
Since only one CPU is engaged in doing this, this is also single-threaded.
This makes tick accounting not scalable. On a busy system with many CPUs,
the tick accounting loop alone can often take more than a tick to process
if the locks it needs to acquire are busy. This causes the invocations of
the clock() handler to drift in time. Consequently, the lbolt drifts. So,
any timing based on the lbolt becomes inaccurate. Any computations based
on the lbolt (such as load averages) also get skewed.
We need to make tick accounting more scalable and reduce the impact of it
on the clock() handler.
There is work in progress to address 6582502. The idea is to eliminate ts_tick() altogether and drive all
scheduling from timeout(). There is still an issue of scalability of timeout() imlementation, so this
fix may be still applicable in some form.