OpenSolaris

Printable Version Enter a New Search
Bug ID 6507659
Synopsis tsc differences between CPU's give dtrace_gethrtime() serious problems
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:amd64
Keywords delta | dtrace_gethrtime | rtiq_reviewed | tsc
Responsible Engineer Jonathan Haslam
Reported Against s10 , s10u3_fcs
Duplicate Of
Introduced In solaris_10
Commit to Fix snv_58
Fixed In snv_58
Release Fixed solaris_nevada(snv_58) , solaris_10u5(s10u5_02) (Bug ID:2152456)
Related Bugs
Submit Date 22-December-2006
Last Update Date 18-October-2007
Description
The function dtrace_gethrtime() is consumed by several kernel subsystems including DTrace and FMA and it provides a lock free mechanism for obtaining a high precision timer. On an amd platform the function uses the tsc counters as the source of the high precision timer. However, it suffers from a nasty problem. The code looks like this:


	if ((tsc = tsc_read()) >= tsc_last)
		tsc -= tsc_last;
	else if (tsc >= tsc_last - 2*tsc_max_delta)
		tsc = 0;
 
	hrt = tsc_hrtime_base;

	TSC_CONVERT_AND_ADD(tsc, hrt, nsec_scale)

Looking at the above we see that if tsc is ever slightly less than tsc_last (tsc_max_delta
is very small at around 1000) then the whole value just read from the tsc will be scaled
and added to tsc_hrtime_base. This results in a very large time value being
generated and returned. Time would then appear to have jumped forwards. Unfortunately it
is reasonably common for multi processor/multi core amd systems to have tsc values which
are very different across CPU's. In the above code, if the tsc_read() occurred on a
different CPU to that which was used to calculate tsc_last and these processors have
very different tsc values, we are in for a rough ride.

The above causes real problems. For example, it causes DTrace to think that it's deadman
timers haven't been firing and the system has gone unresponsive and therefore it bails.
Very bad news.

The fix for this is to change dtrace_gethrtime() to use the per processor tsc deltas that
we already have stored in the tsc_sync_tick_delta[] array in the same way that 
tsc_gethrtime_delta() does.
Work Around
To turn off watchdog checking, run DTrace in destructive mode, either
using the -w flag or include "#pragma D option destructive" in a script. Caution must be taken since you can affect the system, including forcing a panic.
Comments
N/A