OpenSolaris

Printable Version Enter a New Search
Bug ID 6540867
Synopsis Solaris 10 U2 and U3 occasionally steps time backwards one tick
State 11-Closed:Duplicate (Closed)
Category:Subcategory kernel:time
Keywords gettimeofday | monotonically
Responsible Engineer Sudheer Abdul-salam
Reported Against
Duplicate Of 6539802
Introduced In
Commit to Fix
Fixed In
Release Fixed
Related Bugs 6463815 , 6539802 , 6600939
Submit Date 30-March-2007
Last Update Date 27-April-2007
Description
After upgrading from Solaris 10 U1 (or GA) to U2 or U3, gettimeofday() occasionally steps the time backwards one tick.

This appear to happen only x86, and possibly only on amd systems. I have been unable to reproduce the problem on an amd system running Solaris 10 GA, T1000 running s10 u3, Via C7 based x86 system runing snv_60, v20z (amd) system running snv_51.

Here is a simple test program that shows the behavior in just a few seconds:
---
#include <sys/time.h> 
#include <sys/types.h> 
#include <stdio.h> 
 
 
int main(int argc, char** argv) 
{ 
    struct timeval t1; 
    struct timeval t2; 
 
    gettimeofday(&t1, NULL); 
 
    while (1) 
    { 
        gettimeofday(&t2, NULL); 
 
        if ((t2.tv_sec < t1.tv_sec) || 
            ((t2.tv_sec == t1.tv_sec) && (t2.tv_usec < t1.tv_usec))) 
        { 
            printf("Time went back - %d / %d - %d / %d\n", 
                   t2.tv_sec, 
                   t2.tv_usec, 
                   t1.tv_sec, 
                   t1.tv_usec); 
        } 
 
        t1.tv_sec = t2.tv_sec; 
        t1.tv_usec = t2.tv_usec; 
    } 
}
---

Running this program should never produce any output.

Here's a sample from one customer system:
---
Time went back - 1175086206 / 652959 - 1175086206 / 662959
Time went back - 1175086206 / 692960 - 1175086206 / 702973
Time went back - 1175086206 / 942954 - 1175086206 / 952955
Time went back - 1175086208 / 712925 - 1175086208 / 722926
Time went back - 1175086209 / 352914 - 1175086209 / 362915
---

What sticks out is that it seems that the time is stepped backwards approximately 10 ms, which happens to be the normal tick rate.

Adding the following line to /etc/system and rebooting the system does indeed show that the step drops to 1 ms, which is the normal rate with "hires_tick" set:
---
set hires_tick = 1
---

---
Time went back - 1175246726 / 934 - 1175246726 / 1937
...
Time went back - 1175253573 / 469678 - 1175253573 / 470677
Time went back - 1175253579 / 677 - 1175253579 / 1678
Time went back - 1175253597 / 676 - 1175253597 / 1678
---

The x4500 system I have reproduced this on has 118855-19 installed, and the customer has up to -36 installed. The systems I have tested that does now show this has no version of 118855 installed.

I assume that there is something that has gone into 118855 that is causing this, and I have read through the patch description, but haven't been able to pin-point any particular putback that would cause this.
I have now noticed that this appears to be a multiprocessor issue.

I ran the test program for about an hour without problem on a 4 processor x4500, with 3 processors disabled with psradm. Only about a minute after enabling one additional processor did the problem show up and it now shows frequently.
Work Around
N/A
Comments
N/A