OpenSolaris

Printable Version Enter a New Search
Bug ID 6670367
Synopsis occasional spurious panic when running scanpci with Intel quad ethernet card
State 10-Fix Delivered (Fix available in build)
Category:Subcategory fma:io
Keywords
Responsible Engineer Stephen Hanson
Reported Against hwp2 , s10u4_12b , netra_x4200
Duplicate Of
Introduced In solaris_10u4
Commit to Fix snv_86
Fixed In snv_86
Release Fixed solaris_nevada(snv_86) , solaris_10u6(s10u6_03) (Bug ID:2160064)
Related Bugs 6669530
Submit Date 3-March-2008
Last Update Date 6-April-2008
Description
Running scanpci -vO on an x86 pciex system with an intel quad ethenet card we get the occasional panic "pcie_pci-1: PCI(-X) Express Fatal Error".

This is happening due to us seeing a "UR" in the AER Uncorrectable status register, but no corresponding CE bit in the Device Status register.

This appears to be caused by a race condition between scanpci causing URs on the differerent ethernet ports at the same time, while the kernel MSI interrupt handler is reading/clearing the error bits. The result is that the fma code partially clears the registers for the second device while handling the error on the first device, then gets an incomplete  set of data when subsequently handling the error on the second device. 

We can't avoid the race, but it looks like the check for the CE bit being present is unnecessary anyway - we shouldn't panic if CE is not present (only if NFE *is* preesent).
Work Around
N/A
Comments
N/A