|
Description
|
Current diagnosis of FBDIMM channel errors on T5440 do not include the memory
board as a suspect. This board is part of the channel pathway and should be
included.
T5440 running S10U6 with kernel patch 138888-08.
HEre's more system's info:
Sun System Firmware 7.1.6.g 2008/10/20 22:18
Host flash versions:
Hypervisor 1.6.7.a 2008/08/30 05:20
OBP 4.29.0.a 2008/09/15 12:02
POST 4.29.0.a 2008/09/15 12:35
System had a Memory failure and the moment fmd sees the fault it core dumps continuously. Had 2 core files created. pstack output and mdb showed the following:
*CORE FILES & PSTACK:*
core.fmd.4101
core.fmd.4101.pstack.txt
core.fmd.4113
core.fmd.4113.pstack.txt
** mdb showed this on both core files;
> $C
fde7b800 libc.so.1`strncpy+0x134(fde7b8a0, 42523000, 4, 1, 464677, 80808080)
fde7b840 cpumem-diagnosis.so`*cmd_bank_fault*+0x6c(17e180, 45a440, 0, 177a0, ffb9c0e9, 0)
fde7bca0 cpumem-diagnosis.so`cmd_ue_common+0x1f0(0, 0, 0, 0, 0, 0)
> ::stack
libc.so.1`strncpy+0x134(fde7b8a0, 42523000, 4, 1, 464677, 80808080)
cpumem-diagnosis.so`cmd_bank_fault+0x6c(17e180, 45a440, 0, 177a0, ffb9c0e9, 0)
cpumem-diagnosis.so`cmd_ue_common+0x1f0(0, 0, 0, 0, 0, 0)
** pstack showed this on both output:
# pstack core.fmd.4101.pstack.txt
----------------- lwp# 12 / thread# 12 --------------------
fefb3774 strncpy (fde7b8a0, 42523000, 4, 1, 464677, 80808080) + 134
fdd4e8b8* cmd_bank_fault* (17e180, 45a440, 0, 177a0, ffb9c0e9, 0) + 6c
fdd4ebe4 *cmd_ue_common* (0, 0, 0, 0, 0, 0) + 1f0
I have the core files and pstack saved under here:
/net/cores.central/cores/dir18/71162614/FMD_corefiles_prior_mem_replacement/oldfm
Customer replaced the bad memory cards (2) but with incorrect part number.
The moment customer re-enabled fmd it core dumps and created 2 core files then it stopped.
Core FIles:
-rwxrwxrwx 1 root root 19501989 Jun 9 20:36 core.fmd.1016*
-rwxrwxrwx 1 root root 19553545 Jun 9 20:45 core.fmd.1021*
Pstack:
-rwxrwxrwx 1 root other 13639 Jun 10 14:29 pstack.core1016.txt*
-rwxrwxrwx 1 root other 12603 Jun 10 14:29 pstack.core1021.txt*
Files are located here:
/net/cores.central/cores/dir18/71162614/fmd_corefiles_pstack
I have Steve Hanson analyzed them and he pointed that customer is experiencing
CR 6716862 where patch fix is included in S10 latest kernel patch 139555-08.
|