OpenSolaris

Printable Version Enter a New Search
Bug ID 6761587
Synopsis Disabling active-active boot disks with MPxIO fails in special case
State 11-Closed:Verified (Closed)
Category:Subcategory kernel:io-multipath
Keywords 2008.11-reviewed | fstyp | nevada | snv | stmsboot
Responsible Engineer James Mcpherson
Reported Against snv_100
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_104
Fixed In snv_104
Release Fixed solaris_nevada(snv_104) , solaris_10u7(s10u7_02) (Bug ID:2169684)
Related Bugs 6707555 , 6781590 , 6890498
Submit Date 20-October-2008
Last Update Date 9-June-2009
Description
System fails to maintenance mode when executing 'stmsboot -d' command to switch from MPXio mode to non-MPXio mode.

This is how the failure is reproduced -

1. Jumpstart a 4v system (a Glendale in this case) with snv_100a in MPXio mode.
2. Execute 'stmsboot -d'.  The failure message is logged at console.

The three (3) notes below show the error message at the console, content of /etc/vfstab files before and after the command, and the correct translation from 'stmsboot -L' before the command

===

NOTE 1 -  File /etc/vfstab indicates a long path before executing 'stmsboot -d' command.  Command 'stmsboot -L' also shows correct translation from long path to short.

d36d-root@[/etc]># ready to boot from 'stmsboot -d' command
d36d-root@[/etc]>more vfstab
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/dsk/c5t60060E80042740000000274000000127d0s1        -       -       swap    -       no      -
/dev/dsk/c5t60060E80042740000000274000000127d0s0        /dev/rdsk/c5t60060E80042740000000274000000127d0s0       /       ufs     1       no      -
/devices        -       /devices        devfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
swap    -       /tmp    tmpfs   -       yes     -
# File modification by Jumpstart installation
### csstdist-240:/usr/dist      -       /usr/dist       nfs     -       yes     -
# End file modificaiton
d36d-root@[/etc]>stmsboot -L | grep c5t60060E80042740000000274000000127d0
/dev/rdsk/c1t50060E8004274064d0 /dev/rdsk/c5t60060E80042740000000274000000127d0
/dev/rdsk/c2t50060E8004274074d0 /dev/rdsk/c5t60060E80042740000000274000000127d0

===

NOTE 2 - Console message at 'stmsboot -d' shows system drops to maintenance mode

d36d-root@[/etc]>Oct 20 14:01:14 d36d reboot: initiated by root on /dev/pts/1
Oct 20 14:01:21 d36d syslogd: going down on signal 15
Oct 20 14:01:21 /usr/lib/snmp/snmpdx: received signal 15
syncing file systems... done
rebooting...
Resetting...


Sun Blade T6320 Server Module, No Keyboard
Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.29.0.a, 16256 MB memory available, Serial #77097386.
Ethernet address 0:14:4f:98:69:aa, Host ID: 849869aa.



QLogic QEM2462 Host Adapter Driver(SPARC): 1.24  11/15/06
Firmware version 4.00.26
Boot device: /pci@0/pci@0/pci@8/SUNW,qlc@0/fp@0,0/disk@w50060e8004274064,0:a  File and args:
SunOS Release 5.11 Version snv_100 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: One or more I/O devices have been retired
Hostname: d36d
mount: /devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a is not this fstype
mount: /devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a is not this fstype

ERROR: stmsboot: failed to mount the root filesystem.
Instructions to recover your previous STMS configuration (if in case the system does not boot):

boot net (or from a cd/dvd/another disk)
fsck <your-root-device>
mount <your-root-device> /mnt
cp /mnt/etc/mpxio/mpt.conf.disable.20081020_1358 /mnt/kernel/drv/mpt.conf
cp /mnt/etc/mpxio/vfstab.disable.20081020_1358 /mnt/etc/vfstab
/usr/sbin/svccfg -f /mnt/etc/mpxio/svccfg_recover
umount /mnt
reboot

/dev/dsk/c5t60060E80042740000000274000000127d0s0 was your root device,
but it could be named differently after you boot net.
These instructions were also logged to the file /etc/mpxio/recover_instructions

The / file system (/dev/rdsk/c5t60060E80042740000000274000000127d0s0) is being checked.

WARNING - Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/rdsk/c5t60060E80042740000000274000000127d0s0).

Oct 20 14:03:44 svc.startd[7]: svc:/system/filesystem/usr:default: Method "/lib/svc/method/fs-usr" failed with exit status 95.
Oct 20 14:03:44 svc.startd[7]: system/filesystem/usr:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Enter user name for system maintenance (control-d to bypass): Oct 20 14:03:46 svc.startd[7]: network/dns/multicast:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Oct 20 14:03:46 svc.startd[7]: failed to abandon contract 32: Permission denied


Enter  password for system maintenance (control-d to bypass):

===

NOTE 3 -  /etc/vfstab isn't changed to the short path as part of the 'stsmboot -d' command.  It is still using the long path

Enter user name for system maintenance (control-d to bypass): root

Enter root password for system maintenance (control-d to bypass):
single-user privilege assigned to root on /dev/console.
Entering System Maintenance Mode

Oct 20 14:33:18 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.11      snv_100 November 2008

RUN    : On [d36d] since: Monday, October 20, 2008 12:21:16 PM PDT

SYSTEM : Sun Blade T6320 Server Module
SYSTEM : Number of Processors: 64 sparcv9 (64 Online)
SYSTEM :    Speed: 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165MhzSYSTEM : Physical Memory: 16256 Mb

.profile[223]: /env_list: cannot create
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>cd /etc
d36d-root@[/etc]>more vfstab
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/dsk/c5t60060E80042740000000274000000127d0s1        -       -       swap    -       no      -
/dev/dsk/c5t60060E80042740000000274000000127d0s0        /dev/rdsk/c5t60060E80042740000000274000000127d0s0       /       ufs     1       no      -
/devices        -       /devices        devfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
swap    -       /tmp    tmpfs   -       yes     -
# File modification by Jumpstart installation
### csstdist-240:/usr/dist      -       /usr/dist       nfs     -       yes     -
# End file modificaiton
It appears on first glance that this is an unexpected artefact of the WWID support which was
integrated recently.

Please attach a copy of all the files in /etc/mpxio to this CR.

It would also be veryvery handy to get a copy of the following:

prtconf -v
prtpicl -v
/lib/mpxio/stmsboot_util -d -L

run before _and after_ "stmsboot -d" is run.

Also, please confirm - your jumpstart profile and server configuration has been written so that
MPxIO is enabled at install time.
Discussion aboutt this issue with colleagues brought out that the device which is providing rootfs for
this host is not SAS-attached, but in fact FC-attached:

/devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a

Therefore the earlier comments about wwid changes are irrelevant.

At this point, the most likely cause for the issue logged here is that the device is attached
via the non-active path from the array (it's an HDS array). If that is indeed the cause, then
I do not believe that this is an error with stmsboot.

Submitter - please confirm the above hypothesis re active/passive paths.
To James MacPherson, please look at the comment section for my responce to your question.
Work Around
N/A
Comments
N/A