|
Description
|
Some (but not all) laptops with Intel GM965 graphics are panicing due to
memory corruption in the kernel when doing ZFS i/o. This was first seen
in the OpenSolaris 2008.05 installer, but has also been seen on plain nv_86
when using cpio to copy files to a ZFS file system.
Further details are recorded in:
http://defect.opensolaris.org/bz/show_bug.cgi?id=1069
Please continue to update that bug with further comments so that the engineers
at both Sun & Intel working on this issue can see them - this bug is just to
get into the Nevada bug tracking lists as a reference to the OpenSolaris bug
report.
From Niveditha Rau:
I installed Developer Preview 2 (which is based on nv_79a) on the Dell XPS M1330 and the install went through successfully without any panics. I have attached the Xorg log file from the 79a install (Xorg.79a) and the Xorg log file from the RC3 install (Xorg.rc3)
A couple of things that looked curious:
- in RC3, we have drm interface version reported as 1.2 versus in 79a it was 1.4? Did we roll back?
- in RC3, we have:
(II) Bus 11 non-prefetchable memory range:
[0] -1 0 0xf0000000 - 0xf00fffff (0x100000) MX[B]
- and a whole bunch of EDID stuff happens in RC3
Niveditha
We china teams failed to reproduce this bug on following laptops.
-Toshiba M9
-SONY VGN-SZ77N
-Lenovo ThinkPad T61
-Lenovo ThinkPad X61
-Dell vostro 1400
I am suspicious the bug is quite hardware dependant.
The size of system memory, BIOS configurations, &etc may affect its activity.
After futher investigation of teams, it was proved that.
1) the total physical memory of the laptops would affect this bug's activity
+ All the laptops with system memory less than 4G will not be hit by this panic
+ Some laptops with 4G system memory will hit by this bug, but not all.
Dell XPS M1330 with 4g was known to be one of these.
2) Setting physmem randomly affects it.
The root cause is agpgart driver doesn't support physical pages above 4G. It should
panic in agp_check_pfns. However, pfn2gartentry got bug to make this not to happen.
Thanks for David.Marx reminding.
case ARC_IGD830:
if ((paddr & ~GTT_POINTER_MASK) != 0) {
<----- always false and fall through
AGPDB_PRINT2((CE_WARN,
"Intel IGD only support 32 bits"));
return (-1);
}
The solutions are
1) add a parameter like ddi dma attribute to devmap_pmem_alloc.
The parameter will limit the physical pages gotten by devmap_pmem_alloc to a certain
physical address range. For example, all pages are below 4G.
2) extend the GTT table to support physical pages above 4G. The latest Intel hadrware may
support 64G. However, we also need step 1 to bind the physical pages below 64G.
So we must implement step 1 and step 2 may be implemented later.
I suspect that devmap_pmem_alloc is returning memory that is
above 4Gb. This causes problems since agpgart gtt needs memory that
is below 4Gb. I noticed that on the Toshiba laptop with the 965GM and 4GB ram,
that the physical memory was between 0x0-0xbfffffff and 0x100000000-0x13fffffff.
I am guessing that physical memory that would have been at 0xc0000000-0xffffffff
was mapped to 0x100000000-0x13fffffff.
I ran the following mdb script on the four crash dumps that I have seen.
All crash dumps appeared to have pages above 4Gb, based on kte_pfnarray
having many entries that are above 0x100000 (which when shifted left 12
to be put into the agpgart gtt tables will overflow a 32 bit value).
The mdb session is from /net/boora.central/brmnas/vw130254/indiana_info/vmcore.1
Similar results will be seen from vmcore.0 and /home/nivedita/intel-1069/vmcore.1
and vmcore.7. I also was able to peek at installation and see similar
results as well, and put in code in agpgart.c to show this situation.
Also, I put in 4Gb in a Intel 965 system (not 965GM), and see
similar entries.
*agpgart_glob_soft_handle/"*agpgart_glob_soft_handle"
*agpgart_glob_soft_handle::print struct i_ddi_soft_state
*agpgart_glob_soft_handle::print struct i_ddi_soft_state array[0] | >a
<a/"agpgart_softstate[0]"
<a::print agpgart_softstate_t
<a::print agpgart_softstate_t asoft_table | >t
<t/"asoft_table[0]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[0]"
<p,0x20/J
<t/"asoft_table[1]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[1]"
<p,0x20/J
<t/"asoft_table[2]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[2]"
<p,0x20/J
<t/"asoft_table[3]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[3]"
<p,0x20/J
<t/"asoft_table[4]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[4]"
<t/"asoft_table[5]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[5]"
<t/"asoft_table[6]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[6]"
<t/"asoft_table[7]"
<t::print keytable_ent_t kte_pfnarray | >p
<t::print keytable_ent_t
.>t
<p/"keytable_ent[7]"
mdb: logging to "mdboutput"
> $<./mdbinput
0xffffff01d61ca3c0: *agpgart_glob_soft_handle
{
array = 0xffffff01e0cee880
lock = {
_opaque = [ 0 ]
}
size = 0xd0
n_items = 0x8
next = 0
}
0xffffff01e77fd510: agpgart_softstate[0]
{
asoft_dip = 0xffffff01d6b836f0
asoft_instmutex = {
_opaque = [ 0 ]
}
asoft_info = {
agpki_mdevid = 0x2a028086
agpki_mver = {
agpv_major = 0
agpv_minor = 0
}
agpki_mstatus = 0
agpki_presize = 0x1dfc
agpki_tdevid = 0
agpki_tver = {
agpv_major = 0
agpv_minor = 0
}
agpki_tstatus = 0
agpki_aperbase = 0xd0000000
agpki_apersize = 0x200
}
asoft_opened = 0x4
asoft_acquired = 0x1
asoft_agpen = 0
asoft_curpid = 0x288
asoft_mode = 0
asoft_pgtotal = 0x20000
asoft_pgused = 0x3f01
asoft_li = 0xffffff01dd8e4500
asoft_table = 0xffffff01f3686000
gart_dma_handle = 0
gart_dma_acc_handle = 0
gart_pbase = 0
gart_vbase = 0
gart_size = 0
asoft_devreg = {
agprd_cpugarts = {
gart_device_num = 0
gart_dev_list_head = 0
}
agprd_targethdl = 0xffffff01f3bd3c90
agprd_masterhdl = 0xffffff01f3bd3d80
agprd_arctype = 1 (ARC_IGD830)
}
asoft_ksp = 0xffffff01d8a4b000
}
0xffffff01f3686000: asoft_table[0]
{
kte_type = 0
kte_key = 0
kte_pgoff = 0x77f
kte_pages = 0x1281
kte_bound = 0x1
kte_memhdl = 0xffffff01dd8dcf40
kte_pfnarray = 0xffffff01f6297000
kte_refcnt = 0
}
0xffffff01f6297000: keytable_ent[0]
0xffffff01f6297000: 11e400 11e401 11e402
11e403 11e404 11e405 11e406
11e407 11e408 11e409 11e40a
11e40b 11e40c 11e40d 11e40e
11e40f 11e410 11e411 11e412
11e413 11e414 11e415 11e416
11e417 11e418 11e419 11e41a
11e41b 11e41c 11e41d 11e41e
11e41f
0xffffff01f3686038: asoft_table[1]
{
kte_type = 0
kte_key = 0x1
kte_pgoff = 0x1a00
kte_pages = 0x640
kte_bound = 0x1
kte_memhdl = 0xffffff01dd8dcc20
kte_pfnarray = 0xffffff01f62a7000
kte_refcnt = 0
}
0xffffff01f62a7000: keytable_ent[1]
0xffffff01f62a7000: 11d281 11d282 11d283
11d284 11d285 11d286 11d287
11d288 11d289 11d28a 11d28b
11d28c 11d28d 11d28e 11d28f
11d290 11d291 11d292 11d293
11d294 11d295 11d296 11d297
11d298 11d299 11d29a 11d29b
11d29c 11d29d 11d29e 11d29f
11d2a0
0xffffff01f3686070: asoft_table[2]
{
kte_type = 0
kte_key = 0x2
kte_pgoff = 0x2040
kte_pages = 0x640
kte_bound = 0x1
kte_memhdl = 0xffffff01dd8dcc00
kte_pfnarray = 0xffffff01f62b1000
kte_refcnt = 0
}
0xffffff01f62b1000: keytable_ent[2]
0xffffff01f62b1000: 11ccc1 11ccc2 11ccc3
11ccc4 11ccc5 11ccc6 11ccc7
11ccc8 11ccc9 11ccca 11cccb
11cccc 11cccd 11ccce 11cccf
11ccd0 11ccd1 11ccd2 11ccd3
11ccd4 11ccd5 11ccd6 11ccd7
11ccd8 11ccd9 11ccda 11ccdb
11ccdc 11ccdd 11ccde 11ccdf
11cce0
0xffffff01f36860a8: asoft_table[3]
{
kte_type = 0
kte_key = 0x3
kte_pgoff = 0x2680
kte_pages = 0x2000
kte_bound = 0x1
kte_memhdl = 0xffffff01dd8dcfe0
kte_pfnarray = 0xffffff01f62c7000
kte_refcnt = 0
}
0xffffff01f62c7000: keytable_ent[3]
0xffffff01f62c7000: 11c701 11c702 11c703
11c704 11c705 11c706 11c707
11c708 11c709 11c70a 11c70b
11c70c 11c70d 11c70e 11c70f
11c710 11c711 11c712 11c713
11c714 11c715 11c716 11c717
11c718 11c719 11c71a 11c71b
11c71c 11c71d 11c71e 11c71f
11c720
0xffffff01f36860e0: asoft_table[4]
{
kte_type = 0
kte_key = 0
kte_pgoff = 0
kte_pages = 0
kte_bound = 0
kte_memhdl = 0
kte_pfnarray = 0
kte_refcnt = 0
}
0: keytable_ent[4]
0xffffff01f3686118: asoft_table[5]
{
kte_type = 0
kte_key = 0
kte_pgoff = 0
kte_pages = 0
kte_bound = 0
kte_memhdl = 0
kte_pfnarray = 0
kte_refcnt = 0
}
0: keytable_ent[5]
0xffffff01f3686150: asoft_table[6]
{
kte_type = 0
kte_key = 0
kte_pgoff = 0
kte_pages = 0
kte_bound = 0
kte_memhdl = 0
kte_pfnarray = 0
kte_refcnt = 0
}
0: keytable_ent[6]
0xffffff01f3686188: asoft_table[7]
{
kte_type = 0
kte_key = 0
kte_pgoff = 0
kte_pages = 0
kte_bound = 0
kte_memhdl = 0
kte_pfnarray = 0
kte_refcnt = 0
}
0: keytable_ent[7]
|