OpenSolaris

Printable Version Enter a New Search
Bug ID 6484879
Synopsis fmadm faulty output requires improvement/extension
State 10-Fix Delivered (Fix available in build)
Category:Subcategory utility:fm
Keywords fma-b56 | fma_s10u5 | glendale_reviewed | glendale_track | no_huron
Responsible Engineer Adrian Frost
Reported Against s10 , solaris_10
Duplicate Of
Introduced In
Commit to Fix snv_76
Fixed In snv_76
Release Fixed solaris_nevada(snv_76) , solaris_10u5(s10u5_05) (Bug ID:2155716)
Related Bugs 6502811 , 6567489 , 6608158 , 6660642 , 6758559 , 6224676
Submit Date 23-October-2006
Last Update Date 13-November-2007
Description
'fmadm faulty' output currently requires a bit more knowledge of fma
detail than we should expect from every admin:

# fmadm faulty
   STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded mem:///motherboard=0/chip=0/memory-controller=0/dimm=0/rank=0
         4521d642-c1f3-4ad7-8923-d317d478056a
-------- ----------------------------------------------------------------------

So when was this diagnosed?  What fault does it have?  How serious is that fault?
What is the FRU to be replaced?  What does "degraded" mean?  What is that
dirty long string "4521d642-c1f3-4ad7-8923-d317d478056a" UUID mean?

In the corresponding console output at diagnosis time (or on restart with the
fault still present) we'd have something like:

Oct 23 12:44:10 va64-x2100g-gmp03 REC-ACTION: Schedule a repair procedure to replace the affected memory module.  Use fmdump -v -u <EVENT_ID> to identify the module.

It sucks that the admin has to cut and paste the event id and follow this
message in order to answer the above questions.

While fmadm(1m) lists the command line options as Evolving and human-readable
output as Unstable we should probably consider leaving 'fmadm faulty'
output unchanged and introduce a new command line option.  Our suggestion
is 'fmadm status' and a mock-up of the output is as follows:

# fmadm status
------------------------------------- ------------ --------- -----------------
EVENT-ID                              MSG-ID       SEVERITY  TIME
------------------------------------- ------------ --------- -----------------
132b2dfa-903d-e81e-bcb4-f98177e762f3  AMD-8000-5M  Major     Jun 28 03:33:09

Fault     : fault.cpu.amd.l2cachedata
Certainty : 75%
Affects   : cpu:///cpuid=0
            Status: Removed from service
FRU       : hc:///motherboard=0/chip=0
	    Label: "CPU0"

Fault     : fault.cpu.amd.l2cachetag
Certainty : 25%
Affects   : cpu:///cpuid=0
            Status: Removed from service
FRU       : hc:///motherboard=0/chip=0
	    Label: "CPU0"
------------------------------------- ------------ --------- -----------------
EVENT-ID                              MSG-ID       SEVERITY  TIME
------------------------------------- ------------ --------- -----------------
27c3a201-f410-610e-c88e-ceac8195ee93  AMD-8000-3K  Major     Oct 13 2004

Fault     : fault.memory.dimm_ck
Certainty : 100%
Affects   : mem:///motherboard=0/chip=2/memory-controller=0/dimm=0
            Status: In service but degraded
FRU       : hc:///motherboard=0/chip=2/memory-controller=0/dimm=0
            Label: "CPU2 DIMM0"
------------------------------------- ------------ --------- -----------------
EVENT-ID                              MSG-ID       SEVERITY  TIME
------------------------------------- ------------ --------- -----------------
d4671fa8-2a01-68da-fe57-9c09f5a717a2  FMD-8000-2K  Minor     Jul 04 1776

Fault     : defect.fmd.module
Certainty : 100%
Affects   : fmd:///module/eft
            Status: Removed from service

Notes:

o Perhaps combine the two fru lines depending on whether we know a label or
  not. ie, this:

FRU       : hc:///motherboard=0/chip=2/memory-controller=0/dimm=0
            Label: "CPU2 DIMM0"
  becomes

FRU       : "CPU2 DIMM0"

  if we have a fru label available, otherwise it would be

FRU       : hc:///motherboard=0/chip=2/memory-controller=0/dimm=0

o Sort most severe cases to top

o Undecided whether long fmri strings should wrap to column 0 or
  wrap indented; the former may look ugly but facilitates
  copy-paste.

The above output content and format is not set in stone and we're open to
suggestions and discussion.  The main constraint is that it be achievable without
significant overhaul to fma infrastructure that would cause implementation to
be delay - we'd like to ship this soon.
Work Around
N/A
Comments
N/A