OpenSolaris

Printable Version Enter a New Search
Bug ID 6655821
Synopsis deadman is not able to detect 1s hang
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:other
Keywords rtiq_reviewed
Responsible Engineer Vitezslav Batrla
Reported Against nhas30u2_solaris_b6
Duplicate Of
Introduced In solaris_8
Commit to Fix snv_85
Fixed In snv_85
Release Fixed solaris_nevada(snv_85)
Related Bugs
Submit Date 28-January-2008
Last Update Date 12-March-2008
Description
I found 2 nits in deadman() code:

1) deadman is not able to detect 1s hang
2) in panic message, deadman claims that it timed out after 1 seconds of clock inactivity,
   while clock was stopped actually for 2 seconds.

Take look at this snippet of deadman():
   1776 
   1777 	if (lbolt != CPU->cpu_deadman_lbolt) {
   1778 		CPU->cpu_deadman_lbolt = lbolt;
   1779 		CPU->cpu_deadman_countdown = deadman_seconds;
   1780 		return;
   1781 	}

                ^^^  this block tests whether lbolt is moving from current CPU's point of                      view. If lbolt moves, it re-sets timer, stores current lbolt and                          returns from the function.

   1782 
   1783 	if (CPU->cpu_deadman_countdown-- > 0)
   1784 		return;
   1785 
                ^^^ When lbolt is stale, decrement timer and return if it was positive
                    before incrementing. So when one uses deadman_seconds=1, we need to
                    get twice into this place, until we are able to pass it and trigger                       panic.
Work Around
# echo "deadman_seconds/W 0" | adb -kw
Comments
N/A