OpenSolaris

Printable Version Enter a New Search
Bug ID 6696145
Synopsis [OpenSolaris bug #1069] Panics due to memory corruption on Intel GM965 systems with heavy zfs i/o
State 10-Fix Delivered (Fix available in build)
Category:Subcategory driver:agpgart
Keywords rtiq_regression
Responsible Engineer Edward Shu
Reported Against snv_86 , osol_2008.05
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_92
Fixed In snv_92
Release Fixed solaris_nevada(snv_92) , solaris_10u6(s10u6_03) (Bug ID:2162862)
Related Bugs
Submit Date 30-April-2008
Last Update Date 13-July-2009
Description
Some (but not all) laptops with Intel GM965 graphics are panicing due to
memory corruption in the kernel when doing ZFS i/o.    This was first seen
in the OpenSolaris 2008.05 installer, but has also been seen on plain nv_86
when using cpio to copy files to a ZFS file system.

Further details are recorded in:
  http://defect.opensolaris.org/bz/show_bug.cgi?id=1069

Please continue to update that bug with further comments so that the engineers
at both Sun & Intel working on this issue can see them - this bug is just to
get into the Nevada bug tracking lists as a reference to the OpenSolaris bug
report.
From Niveditha Rau:
I installed Developer Preview 2 (which is based on nv_79a) on the Dell XPS M1330 and the install went through successfully without any panics.  I have attached the Xorg log file from the 79a install (Xorg.79a) and the Xorg log file from the RC3 install (Xorg.rc3)

A couple of things that looked curious:

 - in RC3, we have drm interface version reported as 1.2 versus in 79a it was 1.4?  Did we roll back?
 - in RC3, we have:
     (II) Bus 11 non-prefetchable memory range:
           [0] -1    0    0xf0000000 - 0xf00fffff (0x100000) MX[B]
 - and a whole bunch of EDID stuff happens in RC3

Niveditha
We china teams failed to reproduce this bug on following laptops.
	-Toshiba M9
	-SONY VGN-SZ77N
	-Lenovo ThinkPad T61 
	-Lenovo ThinkPad X61
	-Dell vostro 1400

I am suspicious the bug is quite hardware dependant. 
The size of system memory, BIOS configurations, &etc may affect its activity.
After futher investigation of teams, it was proved that.
1) the total physical memory of the laptops would affect this bug's activity
  + All the laptops with system memory less than 4G will not be hit by this panic
  + Some laptops with 4G system memory will hit by this bug, but not all.
    Dell XPS M1330 with 4g was known to be one of these.
2) Setting physmem randomly affects it.
The root cause is agpgart driver doesn't support physical pages above 4G. It should
panic in agp_check_pfns. However, pfn2gartentry got bug to make this not to happen.
Thanks for David.Marx reminding.

case ARC_IGD830:                                                        
                if ((paddr & ~GTT_POINTER_MASK) != 0) {  
                           <----- always false and fall through                       
                        AGPDB_PRINT2((CE_WARN,                                  
                            "Intel IGD only support 32 bits"));                 
                        return (-1);                                            
                }
The solutions are
1) add a parameter like ddi dma attribute to devmap_pmem_alloc.
   The parameter will limit the physical pages gotten by devmap_pmem_alloc to a certain 
   physical address range.  For example, all pages are below 4G.

2) extend the GTT table to support physical pages above 4G. The latest Intel hadrware may
   support 64G. However, we also need step 1 to bind the physical pages below 64G.

So we must implement step 1 and step 2 may be implemented later.
I suspect that devmap_pmem_alloc is returning memory that is
above 4Gb.  This causes problems since agpgart gtt needs memory that
is below 4Gb.  I noticed that on the Toshiba laptop with the 965GM and 4GB ram,
that the physical memory was between 0x0-0xbfffffff and 0x100000000-0x13fffffff.
I am guessing that physical memory that would have been at 0xc0000000-0xffffffff 
was mapped to 0x100000000-0x13fffffff.

I ran the following mdb script on the four crash dumps that I have seen.
All crash dumps appeared to have pages above 4Gb, based on kte_pfnarray
having many entries that are above 0x100000 (which when shifted left 12
to be put into the agpgart gtt tables will overflow a 32 bit value).

The mdb session is from /net/boora.central/brmnas/vw130254/indiana_info/vmcore.1
Similar results will be seen from vmcore.0 and /home/nivedita/intel-1069/vmcore.1
and vmcore.7.  I also was able to peek at installation and see similar
results as well, and put in code in agpgart.c to show this situation.
Also, I put in 4Gb in a Intel 965 system (not 965GM), and see
similar entries.

*agpgart_glob_soft_handle/"*agpgart_glob_soft_handle"
*agpgart_glob_soft_handle::print struct i_ddi_soft_state
*agpgart_glob_soft_handle::print struct i_ddi_soft_state array[0] | >a
<a/"agpgart_softstate[0]"
<a::print agpgart_softstate_t
<a::print agpgart_softstate_t asoft_table | >t
<t/"asoft_table[0]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[0]"
<p,0x20/J
<t/"asoft_table[1]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[1]"
<p,0x20/J
<t/"asoft_table[2]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[2]"
<p,0x20/J
<t/"asoft_table[3]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[3]"
<p,0x20/J
<t/"asoft_table[4]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[4]"
<t/"asoft_table[5]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[5]"
<t/"asoft_table[6]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[6]"
<t/"asoft_table[7]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[7]"


mdb: logging to "mdboutput"
> $<./mdbinput
0xffffff01d61ca3c0:             *agpgart_glob_soft_handle
{
    array = 0xffffff01e0cee880
    lock = {
        _opaque = [ 0 ]
    }
    size = 0xd0
    n_items = 0x8
    next = 0
}
0xffffff01e77fd510:             agpgart_softstate[0]
{
    asoft_dip = 0xffffff01d6b836f0
    asoft_instmutex = {
        _opaque = [ 0 ]
    }
    asoft_info = {
        agpki_mdevid = 0x2a028086
        agpki_mver = {
            agpv_major = 0
            agpv_minor = 0
        }
        agpki_mstatus = 0
        agpki_presize = 0x1dfc
        agpki_tdevid = 0
        agpki_tver = {
            agpv_major = 0
            agpv_minor = 0
        }
        agpki_tstatus = 0
        agpki_aperbase = 0xd0000000
        agpki_apersize = 0x200
    }
    asoft_opened = 0x4
    asoft_acquired = 0x1
    asoft_agpen = 0
    asoft_curpid = 0x288
    asoft_mode = 0
    asoft_pgtotal = 0x20000
    asoft_pgused = 0x3f01
    asoft_li = 0xffffff01dd8e4500
    asoft_table = 0xffffff01f3686000
    gart_dma_handle = 0
    gart_dma_acc_handle = 0
    gart_pbase = 0
    gart_vbase = 0
    gart_size = 0
    asoft_devreg = {
        agprd_cpugarts = {
            gart_device_num = 0
            gart_dev_list_head = 0
        }
        agprd_targethdl = 0xffffff01f3bd3c90
        agprd_masterhdl = 0xffffff01f3bd3d80
        agprd_arctype = 1 (ARC_IGD830)
    }
    asoft_ksp = 0xffffff01d8a4b000
}
0xffffff01f3686000:             asoft_table[0]
{
    kte_type = 0
    kte_key = 0
    kte_pgoff = 0x77f
    kte_pages = 0x1281
    kte_bound = 0x1
    kte_memhdl = 0xffffff01dd8dcf40
    kte_pfnarray = 0xffffff01f6297000
    kte_refcnt = 0
}
0xffffff01f6297000:             keytable_ent[0]
0xffffff01f6297000:             11e400          11e401          11e402
                11e403          11e404          11e405          11e406
                11e407          11e408          11e409          11e40a
                11e40b          11e40c          11e40d          11e40e
                11e40f          11e410          11e411          11e412
                11e413          11e414          11e415          11e416
                11e417          11e418          11e419          11e41a
                11e41b          11e41c          11e41d          11e41e
                11e41f          
0xffffff01f3686038:             asoft_table[1]
{
    kte_type = 0
    kte_key = 0x1
    kte_pgoff = 0x1a00
    kte_pages = 0x640
    kte_bound = 0x1
    kte_memhdl = 0xffffff01dd8dcc20
    kte_pfnarray = 0xffffff01f62a7000
    kte_refcnt = 0
}
0xffffff01f62a7000:             keytable_ent[1]
0xffffff01f62a7000:             11d281          11d282          11d283
                11d284          11d285          11d286          11d287
                11d288          11d289          11d28a          11d28b
                11d28c          11d28d          11d28e          11d28f
                11d290          11d291          11d292          11d293
                11d294          11d295          11d296          11d297
                11d298          11d299          11d29a          11d29b
                11d29c          11d29d          11d29e          11d29f
                11d2a0          
0xffffff01f3686070:             asoft_table[2]
{
    kte_type = 0
    kte_key = 0x2
    kte_pgoff = 0x2040
    kte_pages = 0x640
    kte_bound = 0x1
    kte_memhdl = 0xffffff01dd8dcc00
    kte_pfnarray = 0xffffff01f62b1000
    kte_refcnt = 0
}
0xffffff01f62b1000:             keytable_ent[2]
0xffffff01f62b1000:             11ccc1          11ccc2          11ccc3
                11ccc4          11ccc5          11ccc6          11ccc7
                11ccc8          11ccc9          11ccca          11cccb
                11cccc          11cccd          11ccce          11cccf
                11ccd0          11ccd1          11ccd2          11ccd3
                11ccd4          11ccd5          11ccd6          11ccd7
                11ccd8          11ccd9          11ccda          11ccdb
                11ccdc          11ccdd          11ccde          11ccdf
                11cce0          
0xffffff01f36860a8:             asoft_table[3]
{
    kte_type = 0
    kte_key = 0x3
    kte_pgoff = 0x2680
    kte_pages = 0x2000
    kte_bound = 0x1
    kte_memhdl = 0xffffff01dd8dcfe0
    kte_pfnarray = 0xffffff01f62c7000
    kte_refcnt = 0
}
0xffffff01f62c7000:             keytable_ent[3]
0xffffff01f62c7000:             11c701          11c702          11c703
                11c704          11c705          11c706          11c707
                11c708          11c709          11c70a          11c70b
                11c70c          11c70d          11c70e          11c70f
                11c710          11c711          11c712          11c713
                11c714          11c715          11c716          11c717
                11c718          11c719          11c71a          11c71b
                11c71c          11c71d          11c71e          11c71f
                11c720          
0xffffff01f36860e0:             asoft_table[4]
{
    kte_type = 0
    kte_key = 0
    kte_pgoff = 0
    kte_pages = 0
    kte_bound = 0
    kte_memhdl = 0
    kte_pfnarray = 0
    kte_refcnt = 0
}
0:              keytable_ent[4]
0xffffff01f3686118:             asoft_table[5]
{
    kte_type = 0
    kte_key = 0
    kte_pgoff = 0
    kte_pages = 0
    kte_bound = 0
    kte_memhdl = 0
    kte_pfnarray = 0
    kte_refcnt = 0
}
0:              keytable_ent[5]
0xffffff01f3686150:             asoft_table[6]
{
    kte_type = 0
    kte_key = 0
    kte_pgoff = 0
    kte_pages = 0
    kte_bound = 0
    kte_memhdl = 0
    kte_pfnarray = 0
    kte_refcnt = 0
}
0:              keytable_ent[6]
0xffffff01f3686188:             asoft_table[7]
{
    kte_type = 0
    kte_key = 0
    kte_pgoff = 0
    kte_pages = 0
    kte_bound = 0
    kte_memhdl = 0
    kte_pfnarray = 0
    kte_refcnt = 0
}
0:              keytable_ent[7]
Work Around
N/A
Comments
N/A