OpenSolaris

Printable Version Enter a New Search
Bug ID 6577473
Synopsis Nocona box panic when booting snv_68: Can't handle mwait size 0
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:amd64
Keywords
Responsible Engineer Bill Holler
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_70
Fixed In snv_70
Release Fixed solaris_nevada(snv_70) , solaris_10u5(s10u5_03) (Bug ID:2153589)
Related Bugs 6579025 , 6588054
Submit Date 5-July-2007
Last Update Date 8-August-2007
Description
After liveupgraded to snv_68 (from snv_67), our Intel Nocona MP
lab machine panicked at boot time thusly:

    SunOS Release 5.11 Version snv_68 64-bit
    Copyright 1983-2007  xxxxx , Inc.  All rights reserved.
    Use is subject to license terms.

    panic[cpu0]/thread=fffffffffbc25a60: Can't handle mwait size 0

    fffffffffbc46a40 unix:mach_alloc_mwait+95 ()
    fffffffffbc46a60 unix:mach_init+101 ()
    fffffffffbc46aa0 unix:psm_install+ad ()
    fffffffffbc46ab0 unix:startup_end+97 ()
    fffffffffbc46ac0 unix:startup+45 ()
    fffffffffbc46af0 genunix:main+27 ()
    fffffffffbc46b00 unix:_locore_start+92 ()

    skipping system dump - no dump device configured
    rebooting...

snv_67 worked fine on the box. 

Booting directly from snv_68 DVD yielded the exact same panic.
Can I have access to the lab machine?

What is the "psrinfo -pv" output?
cpuid_pass2() gets the MONITOR/MWAIT size from cpuid function 5.  The size
is usually 64 bytes.  Apparently cpuid returned 0 for monitor/mwait size on
this machine.  This is the first machine this has happened on.
panicking Nacona system hagen.sfbay:

# psrinfo -pv
The physical processor has 2 virtual processors (0 2)
  x86 (GenuineIntel F41 family 15 model 4 step 1 clock 3392 MHz)
        Intel(r) Xeon(tm) CPU 3.40GHz
The physical processor has 2 virtual processors (1 3)
  x86 (GenuineIntel F41 family 15 model 4 step 1 clock 3392 MHz)
        Intel(r) Xeon(tm) CPU 3.40GHz


Here is cpuid info stored in the cpuid_info structure hanging on each
cpus' struct machcpu.

The cpuid information in the struct cpuid_info shows cpuid function 1
returned MONITOR/MWAIT support bit 3 (0x8) in ecx, but the monitor linesizes
returned from cpuid function 5 in eax and ecx are 0.

    cpi_std = [
        {
            cp_eax = 0x3
            cp_ebx = 0x756e6547
            cp_ecx = 0x6c65746e
            cp_edx = 0x49656e69
        }
        {
            cp_eax = 0xf41            
            cp_ebx = 0x20800          
            cp_ecx = 0x649d       <---  bit 0x8 MONITOR/MWAT is supported  
            cp_edx = 0xbfebfbff       
        }                             
        {                             
            cp_eax = 0x605b5001       
            cp_ebx = 0                
            cp_ecx = 0                
            cp_edx = 0x7c7040         
        }                             
        {                             
            cp_eax = 0                
            cp_ebx = 0                
            cp_ecx = 0                
            cp_edx = 0                
        }                             
        {                             
            cp_eax = 0                
            cp_ebx = 0                
            cp_ecx = 0                
            cp_edx = 0                
        }                             
        {                             
            cp_eax = 0        <--- Smallest monitor-linesize is 0       
            cp_ebx = 0        <--- Largest monitor-linesize is 0 
            cp_ecx = 0                
            cp_edx = 0                
        }                    


cpuid.c does not call the higher numbered cpuid functions if they are not
supported.  The above information looks like cpuid functions 3 and above 
were not called.


This processor does not support cpuid functions above 3.
> cpuid_info0::print
{
    cpi_pass = 0x4
    cpi_maxeax = 0x3

A user land program to call cpuid returns "3" as the highest supported
cpuid function which confirms the above "cpi_maxeax = 3" is what the 
cpu returned.
cpuid function 1 sets MONITOR/MWAIT support bit 3 (0x8) in ecx.
The "Intel 64 and IA-32 Architectures Software Developer's Manual" states
cpuid function 5 returns the monitor linesize.

cpuid function 5 was never called because cpuid function 0 returned 3 as
the highest supported cpuid leaf.  The monitor linesize is uninitialized 0.
The code cannot assume on this cpu that function 5 is available even
though function 1 ecx:0x08 indicates monitor/mwait is available.

More investigation is needed to see if some other mechanism exists to get
the monitor/mwait linesize on this processor, or if it is not supported.
Here is how various versions of the AP-485 "Intel Processor Identification
and the CPUID Instruction" define leaf 1 ecx support/values.
* Note processors which support leaf 1 ecx may support a subset of ecx bits:

# 241618-021  May 2002:
      ECX "reserved for future feature flags"
      CPUID leaf 5 is not supported.

# 241618-023  March 2003:
      Documents through family 0xf model 0x2.
      ECX has feature flags.  Bits 6:0, and 31:15 are reserved.
              ECX Bit 3 is not defined.  ***
      CPUID leaf 5 is not supported.

# 241618-?  ?:
      Documents through family 0xf model 0x2.
      ECX has feature flags.  Bits 2:1, 6:5, 13:11, and 31:15 are reserved.
              ECX Bit 3 indicates MONITOR/MWAIT support.
      CPUID leaf 5 is not supported.

# 241618-026  June 2004:
      Documents through family 0xf model 0x3.
      ECX has feature flags.  Bits 2:1, 6:5, 13:11, and 31:15 are reserved.
              ECX Bit 3 indicates MONITOR/MWAIT support.
      CPUID leaf 5 is supported "MONITOR/MWAIT Function".

Contacted Intel to determine when leaf 1 ecx bits are supported and which
subset of bits.
The suggested fix boots without issues on the Nacona machine
(family 15 model 4 step 1) which panicked on snv_68 without the fix.

Note: not all Nacona machines hit this panic.
Nacona cpus which support cpuid leaf 5 do not panic.
This has been seen on a Dell 470 and a Dell 670.  Both systems limited
maxeac for cpuid to 3.
Work Around
mwait idle loop can be disabled by setting idle_cpu_prefer_mwait = 0 with
kmdb.
Comments
N/A