OpenSolaris

Printable Version Enter a New Search
Bug ID 6464127
Synopsis Time obtained from microstate accounting may go backwards
State 1-Dispatched (Default State)
Category:Subcategory kernel:sched
Keywords dispatcher | microstate | scheduler | time
Reported Against
Duplicate Of
Introduced In
Commit to Fix
Fixed In
Release Fixed
Related Bugs 6327235 , 6466380
Submit Date 25-August-2006
Last Update Date 12-January-2007
Description
While working on CPU caps project we observed that if we take the microstate
ms_state_start time and compare it with the value of gethrtime_unscaled() on x86 
platform taken after we read ms_state_start, the start of microstate may be after
the current time from gethrtime_unscaled(). This is caused because we are comparing
unscaled times on two different CPUs and the current implementation does not guarantee
that these times are synchronized enough to always move forward.

Looking at x86 implementation we see:

hrtime_t
gethrtime_unscaled(void)
{
        return (gethrtimeunscaledf());
}

The gethrtimeunscaledf is a pointer which is set by default to tsc_gethrtimeunscaled
or tsc_gethrtimeunscaled_delta by tsc_digest. The tsc_gethrtimeunscaled_delta keeps
per-CPU drift values and is supposed to provide synchronized time going forward.

The tsc_gethrtimeunscaled uses global tsc_last_jumped variable to adjust its values.

The choice between the two is determined by tsc_digest() based on the amount of the 
drift between CPUs. 

I would argue that gethrtime_unscaled should always use tsc_gethrtimeunscaled_delta
and guarantee the the time going forward across all CPUs.
The CPU caps project implementation uses the following piece of code:

void
cap_thread_charge(kthread_t *t, hrtime_t *total_cpu, short *new_ticks)
{
        uint64_t new_usage = mstate_thread_onproc_time(t);

        ASSERT3U(new_usage, >=, old_usage);
        ...

The ASSERT above triggers once in a while. 

Extra debugging info was added (such as saving previous results, including all of per-state
microstate buckets along with scaled and unscaled aggregations).  It wasn't
enough to figure out what is going on.  There are two possibilities -
(a) gethrtime_unscaled can return values going backwards (quick code
inspection shows that it is possible, on x86 at least); or (b) there is
a problem with how microstate accounting is done -- that is, we might be
doing syscall_mstate() transition from LMS_USER to LMS_SYSTEM, when we're
currently in LMS_SLEEP (or any other state, other than LMS_USER/SYSTEM/TRAP)
according to t->t_mstate.  See mstate_thread_onproc_time() in msacct.c
for details.
Work Around
N/A
Comments
N/A