|
Description
|
On Nehalem system, I was able to reproduce this panic while running togproc
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
WARNING: _CST: cs_type c66d44f8 bad asid type bb
panic[cpu13]/thread=c68a6dc0:
assertion failed: (cp->cpu_flags & CPU_QUIESCED) == 0, file: ../../common/disp/d
isp.c, line: 1259
c68a6968 genunix:assfail+5a (fe8f7134, fe8f72f0,)
c68a69b8 unix:setbackdq+4b4 (c765fdc0)
c68a69d8 genunix:sleepq_wakeone_chan+67 (fec971f0, c5116a98,)
c68a6a08 genunix:cv_signal+95 (c5116a98, f6a8c04b)
c68a6a48 genunix:taskq_bucket_dispatch+c4 (c39d5f0c, fea40d6c,)
c68a6a98 genunix:taskq_dispatch+f2 (c4f19848, fea40d6c,)
c68a6ac8 genunix:qenable_locked+148 (cac95930, c68a6aec,)
c68a6b08 genunix:putq+3aa (cac95930, c8efc280,)
c68a6b68 genunix:log_sendmsg+2d8 (c5107bc0, 0, 5b, 0)
c68a6ca8 genunix:cprintf+3c9 (fe8ec55c, c68a6cf8,)
c68a6ce8 genunix:cmn_err+4b (2, fe8ec55c, c66d44)
c68a6d48 unix:acpi_cpu_cstate+324 (c64db840, 20018962,)
c68a6d88 unix:cpu_acpi_idle+123 (c7810980, c63fb240,)
c68a6d98 unix:cpu_idle_adaptive+12 (0, 0, c68a6db8, fe8)
c68a6da8 unix:idle+56 (0, 0)
c68a6db8 unix:thread_start+8 ()
[13]>
The dump file shows acpi_cpu_cstate() was called with a
cpu_acpi_cstate_t structure one-passed the end of the cstate array.
The array element before this is the C3 element.
acpi_cpu_cstate+0x324(c64db840, 20018962, c68a6d88, fe80674e)
cpu_acpi_idle+0x123()
cpu_idle_adaptive+0x12()
idle+0x56(0, 0)
thread_start+8()
> c64db840 ::print -a cpu_acpi_cstate_t
{
c64db840 cs_addrspace_id = 0xfeedfabb
c64db844 cs_address = 0x3ec1
c64db848 cs_type = 0xc66d44f8
c64db84c cs_latency = 0x677d8c15
c64db850 cs_power = 0xbaddcafe
c64db854 promotion = 0xbaddcafe
c64db858 demotion = 0xbaddcafe
c64db85c cs_ksp = 0xbaddcafe
}
> c64db820 ::print cpu_acpi_cstate_t
{
cs_addrspace_id = 0x1
cs_address = 0x415
cs_type = 0x3 <---- ACPI C3 state
cs_latency = 0xf5
cs_power = 0x15e
promotion = 0
demotion = 0
cs_ksp = 0xc62bb000
}
What is going on is:
cpu_acpi_idle() determined the CPU should enter the C3 idle state.
cpu_acpi_idle() assumes the structure for C3 is in
cstate[CPU_ACPI_C3 -1].
This system nehalem2 does not have an ACPI C2 state. The cstate array
was initialized as:
cstate[0] = ACPI C1 info // correct
cstate[1] = ACPI C3 into // incorrect {should be C2 info}
cstate[2] = Uninitiallized // incorrect {should be C3 info}
This panic and bugid 6807891 have the same cause: Solaris C-state support
expect C2 and C3 to exist.
------------------------------------------------------------------------
Idle threads cannot call cmn_err() because they cannot block.
The cmn_err() that caused this panic was removed during development
because it no longer served its purpose due to code re-arrangement
and because idle threads cannot block. I am not sure how this
cmn_err() got back in here. :-(
We did not think acpi_cpu_idle() could ever be called with this bogus
cpus[2] entry because: the latency for ACPI C2 state "cs_C2_latency"
was initialized to an very large value CPU_CSTATE_LATENCY_UNDEF that
would cause the c-state selection algorithm to never get passed it.
The c-state selection algorithm keeps track of data about the CPU's
idle duration etc. This data is not cleared when the CPU goes offline
and then online as was the case on this system which is running a
CPU online/offline. (This is normally not a problem.) The c-state
selection code considered the offline time as idle time in its
statistics calculations. That is why the CPU was thought to be able
to go idle long enough to select C2 with latency: CPU_CSTATE_LATENCY_UNDEF.
Webrev is here:
http://cr.opensolaris.org/~bholler/6807891wr/index.html
hg pdiffs are attached in file "my_pdiffs".
|