|
Description
|
System fails to maintenance mode when executing 'stmsboot -d' command to switch from MPXio mode to non-MPXio mode.
This is how the failure is reproduced -
1. Jumpstart a 4v system (a Glendale in this case) with snv_100a in MPXio mode.
2. Execute 'stmsboot -d'. The failure message is logged at console.
The three (3) notes below show the error message at the console, content of /etc/vfstab files before and after the command, and the correct translation from 'stmsboot -L' before the command
===
NOTE 1 - File /etc/vfstab indicates a long path before executing 'stmsboot -d' command. Command 'stmsboot -L' also shows correct translation from long path to short.
d36d-root@[/etc]># ready to boot from 'stmsboot -d' command
d36d-root@[/etc]>more vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c5t60060E80042740000000274000000127d0s1 - - swap - no -
/dev/dsk/c5t60060E80042740000000274000000127d0s0 /dev/rdsk/c5t60060E80042740000000274000000127d0s0 / ufs 1 no -
/devices - /devices devfs - no -
sharefs - /etc/dfs/sharetab sharefs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
# File modification by Jumpstart installation
### csstdist-240:/usr/dist - /usr/dist nfs - yes -
# End file modificaiton
d36d-root@[/etc]>stmsboot -L | grep c5t60060E80042740000000274000000127d0
/dev/rdsk/c1t50060E8004274064d0 /dev/rdsk/c5t60060E80042740000000274000000127d0
/dev/rdsk/c2t50060E8004274074d0 /dev/rdsk/c5t60060E80042740000000274000000127d0
===
NOTE 2 - Console message at 'stmsboot -d' shows system drops to maintenance mode
d36d-root@[/etc]>Oct 20 14:01:14 d36d reboot: initiated by root on /dev/pts/1
Oct 20 14:01:21 d36d syslogd: going down on signal 15
Oct 20 14:01:21 /usr/lib/snmp/snmpdx: received signal 15
syncing file systems... done
rebooting...
Resetting...
Sun Blade T6320 Server Module, No Keyboard
Copyright 2008 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.29.0.a, 16256 MB memory available, Serial #77097386.
Ethernet address 0:14:4f:98:69:aa, Host ID: 849869aa.
QLogic QEM2462 Host Adapter Driver(SPARC): 1.24 11/15/06
Firmware version 4.00.26
Boot device: /pci@0/pci@0/pci@8/SUNW,qlc@0/fp@0,0/disk@w50060e8004274064,0:a File and args:
SunOS Release 5.11 Version snv_100 64-bit
Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
NOTICE: One or more I/O devices have been retired
Hostname: d36d
mount: /devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a is not this fstype
mount: /devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a is not this fstype
ERROR: stmsboot: failed to mount the root filesystem.
Instructions to recover your previous STMS configuration (if in case the system does not boot):
boot net (or from a cd/dvd/another disk)
fsck <your-root-device>
mount <your-root-device> /mnt
cp /mnt/etc/mpxio/mpt.conf.disable.20081020_1358 /mnt/kernel/drv/mpt.conf
cp /mnt/etc/mpxio/vfstab.disable.20081020_1358 /mnt/etc/vfstab
/usr/sbin/svccfg -f /mnt/etc/mpxio/svccfg_recover
umount /mnt
reboot
/dev/dsk/c5t60060E80042740000000274000000127d0s0 was your root device,
but it could be named differently after you boot net.
These instructions were also logged to the file /etc/mpxio/recover_instructions
The / file system (/dev/rdsk/c5t60060E80042740000000274000000127d0s0) is being checked.
WARNING - Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/rdsk/c5t60060E80042740000000274000000127d0s0).
Oct 20 14:03:44 svc.startd[7]: svc:/system/filesystem/usr:default: Method "/lib/svc/method/fs-usr" failed with exit status 95.
Oct 20 14:03:44 svc.startd[7]: system/filesystem/usr:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run
Enter user name for system maintenance (control-d to bypass): Oct 20 14:03:46 svc.startd[7]: network/dns/multicast:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Oct 20 14:03:46 svc.startd[7]: failed to abandon contract 32: Permission denied
Enter password for system maintenance (control-d to bypass):
===
NOTE 3 - /etc/vfstab isn't changed to the short path as part of the 'stsmboot -d' command. It is still using the long path
Enter user name for system maintenance (control-d to bypass): root
Enter root password for system maintenance (control-d to bypass):
single-user privilege assigned to root on /dev/console.
Entering System Maintenance Mode
Oct 20 14:33:18 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc. SunOS 5.11 snv_100 November 2008
RUN : On [d36d] since: Monday, October 20, 2008 12:21:16 PM PDT
SYSTEM : Sun Blade T6320 Server Module
SYSTEM : Number of Processors: 64 sparcv9 (64 Online)
SYSTEM : Speed: 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165Mhz / 1165MhzSYSTEM : Physical Memory: 16256 Mb
.profile[223]: /env_list: cannot create
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>
d36d-root@[/root]>cd /etc
d36d-root@[/etc]>more vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c5t60060E80042740000000274000000127d0s1 - - swap - no -
/dev/dsk/c5t60060E80042740000000274000000127d0s0 /dev/rdsk/c5t60060E80042740000000274000000127d0s0 / ufs 1 no -
/devices - /devices devfs - no -
sharefs - /etc/dfs/sharetab sharefs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
# File modification by Jumpstart installation
### csstdist-240:/usr/dist - /usr/dist nfs - yes -
# End file modificaiton
It appears on first glance that this is an unexpected artefact of the WWID support which was
integrated recently.
Please attach a copy of all the files in /etc/mpxio to this CR.
It would also be veryvery handy to get a copy of the following:
prtconf -v
prtpicl -v
/lib/mpxio/stmsboot_util -d -L
run before _and after_ "stmsboot -d" is run.
Also, please confirm - your jumpstart profile and server configuration has been written so that
MPxIO is enabled at install time.
Discussion aboutt this issue with colleagues brought out that the device which is providing rootfs for
this host is not SAS-attached, but in fact FC-attached:
/devices/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0/ssd@w50060e8004274074,0:a
Therefore the earlier comments about wwid changes are irrelevant.
At this point, the most likely cause for the issue logged here is that the device is attached
via the non-active path from the array (it's an HDS array). If that is indeed the cause, then
I do not believe that this is an error with stmsboot.
Submitter - please confirm the above hypothesis re active/passive paths.
To James MacPherson, please look at the comment section for my responce to your question.
|