OpenSolaris

Printable Version Enter a New Search
Bug ID 6684776
Synopsis scsi second disk no longer attaches after liveupgraded from s10u4 to s10u5_10/snv_81
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:pm
Keywords
Responsible Engineer Mark Haywood
Reported Against snv_81 , s10u5_10
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_89
Fixed In snv_89
Release Fixed solaris_nevada(snv_89) , solaris_10u6(s10u6_02) (Bug ID:2161276)
Related Bugs 6512756 , 6667635
Submit Date 5-April-2008
Last Update Date 2-June-2008
Description
On a  xxxxx  Blade 2500 (non-silver), after liveupgraded from s10u4 to s10u5_10,
the second disk (sd0, aka c0t1d0) no longer attaches, effectively rendering
itself invisible to even the format command.

The s10u5_10 kernel appears to see the second disk just fine:
% grep -w sd /etc/path_to_inst
"/pci@1d,700000/scsi@4/sd@0,0" 3 "sd"
"/pci@1d,700000/scsi@4/sd@1,0" 0 "sd"   <<< second disk
% grep 's[0-3]' /var/adm/messages
 ...
Apr  4 19:29:27 grimhilde unix: sd3 at glm0: target 0 lun 0
Apr  4 19:29:27 grimhilde unix: sd3 is /pci@1d,700000/scsi@4/sd@0,0
Apr  4 19:29:43 grimhilde unix: sd0 at glm0: target 1 lun 0
Apr  4 19:29:43 grimhilde unix: sd0 is /pci@1d,700000/scsi@4/sd@1,0
% iostat -En
iostat -En
sd0              Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST336607LSUN36G  Revision: 0707 Serial No: 0243A0BGBH 
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: SEAGATE  Product: ST336704LSUN36G  Revision: 0326 Serial No: 0048D12ZRZ 
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

According to Vikram Hegde:

    The disk is not attaching. The disk is either broken (or at least 
    failing)  or there is a bug in the s10u5 "sd" driver. Please have this 
    bug investigated by an "sd" driver expert.

Given that the same disk is visible and works properly when the machine
booted s10u4, it's unlikely that the disk is failing.
I also reproduce this on linei-sb2500 and it had two 146 GB disk. This is a serious regression and it must be fixed.

I will try to go back to previous s10u5 builds to figure out which build introduce this regression.

We will update with test result later:

1. Live upgrade from s10u4 FCS to s10u5_10 - fails if we have partition on second scsi disk.

2.  Regular upgrade from s10u4 FCS to s10u5_10 - works 
3.  Live upgrade from s10u4 FCS to s10u5_06 -- fails
4.  Live upgrade from s10u4 FCS to s10u5_08 -- fails
Work Around
Regular upgrade works instead of live upgrade.
If this is pm related on sparc platforms only, the following might
provide a workaround (needs to be verified).

from OBP:
{0} ok setenv energystar-enabled? false

from kernel:
# eeprom energystar-enabled?=false;reboot
From the comments :

Disable power management. (set autopm to "disable" in power.conf and pmconfig -r, reboot)
I tried the following workaround and I was able to get live upgrade to work on linei-sb2500:

Disable power management. (set autopm to "disable" in power.conf and pmconfig -r, reboot)

I retry live upgrade and this works fine without any problems.
I got the following sparc binaries from Mark Haywood and thi swill fix the problem in Nevada and s10u5:

s10u5:
/net/tribble.east/export/build/mh27603/6684776/packages/sparc/nightly-nd/SUNWcakr.u/reloc/platform/sun4u/kernel/drv/sparcv9/ppm

Nevada:
/net/zhadum.east/export/ws/mh27603/6684776/packages/sparc/nightly-nd/SUNWcakr.u/reloc/platform/sun4u/kernel/drv/sparcv9/ppm 

the binary should be copied onto the target machine as:

/platform/sun4u/kernel/drv/sparcv9/ppm 


I had tried the s10u5 and Nevada sparc binary and it does fixes the problem.
Comments
N/A