OpenSolaris

Printable Version Enter a New Search
Bug ID 6584697
Synopsis Can't boot Xen / Solaris dom0 if root is using ZFS
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:boot-x86
Keywords zfs-boot
Responsible Engineer Joseph Bonasera
Reported Against f_001
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_75
Fixed In snv_75
Release Fixed solaris_nevada(snv_75) , solaris_10u6(s10u6_03) (Bug ID:2161981)
Related Bugs 6541114 , 6584769
Submit Date 24-July-2007
Last Update Date 10-October-2007
Description
From Jurgen Keil....

... I found yet another way to get the "Error 16: Inconsistent filesystem
structure" from GRUB.  This time when trying to boot a Xen Dom0 from a
zfs bootfs


Synopsis: grub/zfs-root: cannot boot xen from a zfs root
========================================================================

I've tried to install snv66 + xen into an lzjb compressed zfs
root filesystem.

menu.lst entry for xen is:

# ------------------------------------------------------------
title Solaris Nevada snv_66 X86 (xen dom0)
root (,0,g)
bootfs files/s11-root-xen
kernel$ /boot/$ISADIR/xen.gz
module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/uni
x -B $ZFS-BOOTFS -vk
module$ /platform/i86pc/$ISADIR/boot_archive
# ------------------------------------------------------------

grub boot for xen crashes with the error message:

    Error 16: Inconsistent filesystem structure



GRUB uses fixed memory locations for MOS, DNODE, ZFS_SCRATCH...

MOS is at memory location 0x100000.
DNODE is at memory location 0x140000.
ZFS_SCRATCH is at memory location 0x180000.

Standard Solaris kernel /platform/i86pc/kernel/amd64/unix loads at
0x400000, 0x800000 and 0xC00000, and /platform/i86pc/amd64/boot_archive
is loaded at 0xd5d000 - all after grub's MOS / DNODE / ZFS_SCRATCH location.



Xen hypervisor /boot/amd64/xen.gz is loaded at
<0x100000:0x9c878:0x58788>.

GRUB is able to read the first 128k of compressed data from the zfs
root, decompresses the data to address 0x100000, and the attempt to
read the next 128k block from xen.gz fails because the DNODE data is
overwritten.  Things start to fail when we find 
"DNODE->dn_datablkszsec == 35656" (should be 256) in zfs_read(), that is,
a datablk size of ~18mbytes instead of the expected 128kbytes.


Problem #1:
===========

fsys_zfs.c is supposed to use the following memory map:

 * (memory addr)   MOS      DNODE       ZFS_SCRATCH
 *                  |         |          |
 *          +-------V---------V----------V---------------+
 *   memory |       | dnode   | dnode    |  scratch      |
 *          |       | 512B    | 512B     |  area         |
 *          +--------------------------------------------+

Using these defines...

#define MOS                     ((dnode_phys_t *)(RAW_ADDR(0x100000)))
#define DNODE                   ((dnode_phys_t *)(MOS + DNODE_SIZE))
#define ZFS_SCRATCH             ((char *)(DNODE + DNODE_SIZE))

... the DNODE area is located ``512*sizeof(dnode_phys_t)'' bytes after
MOS, not 512 bytes!  Instead of 512 bytes for MOS, fsys_zfs is using
256 kbytes.   Same problem with the size for the DNODE area.

Apparently we want:

#define MOS                     ((dnode_phys_t *)(RAW_ADDR(0x100000)))
#define DNODE                   ((dnode_phys_t *)((char*)MOS + DNODE_SIZE))
#define ZFS_SCRATCH             ((char *)DNODE + DNODE_SIZE)


Problem #2:
===========

We should find a better base address for MOS/DNODE/ZFS_SCRATCH

This seems to be the memory in use by GRUB:

 0x007be BOOT_PART_TABLE
 0x01000-0x1fff STAGE1_STACK / real mode stage2 STACKOFF (< 0x2000)
 0x02000 MB_CMDLINE_BUF
 0x07C00 BOOTSEC_LOCATION / MBR
 0x08000 stage1 / PBR (start.S)
 0x08200 stage2 (asm.S)
 0x10000 LINUX_ZIMAGE_ADDR
 0x60000-0x67fff protected mode stack
 0x68000 FSYS_BUF (filesystem (not raw device) buffer / 32k)
 0x70000 BUFFERADDR (raw device buffer / 31.5K)
 0x77e00 SCRATCHADDR (512-byte scratch area)
 0x78000 PASSWORD_BUF ... MENU_BUF
 0x80000 free?
 0x90000 LINUX_OLD_REAL_MODE_ADDR
 0xA0000 Video memory?
 0xB0000 HERCULES_VIDEO_ADDR
0x100000 LINUX_BZIMAGE_ADDR / XEN

Maybe reusing 0x90000 could work (because we don't want to boot old
linux stuff)?

Or the FSYS_BUF at 0x68000?  Other fsys_xxx modules use the 32k at
0x68000 FSYS_BUF.


Well, I experimented with these addresses, but the problem seems to be
that ZFS_SCRATCH needs *lots* of free space. All the areas below 0x100000
appear to be too small for fsys_zfs.c


I'm currently using 0x4000000 as MOS base address, as an ugly workaround,
to boot both standard Solaris kernels and the xen hypervisor:


#define MOS                     ((dnode_phys_t *)(RAW_ADDR(0x4000000)))
#define DNODE                   ((dnode_phys_t *)((char *)MOS + DNODE_SIZE))
#define ZFS_SCRATCH             ((char *)DNODE + DNODE_SIZE)
I'm going to investigate fixing this by locating the
GRUB ZFS memory areas dynamically down from the top of
physical memory or the 4 Gig addresssibility limit,
whichever is lower.
So I managed to get a lab system into this setup..
By grub fix worked enough to get Xen to boot and
dom0 to make it through startup..

However...

startup.c:1970: Enabling interrupts
startup.c:1981: startup_end() done
startup.c:2072: Unmapping lower boot pages
startup.c:2089: Releasing boot pages
startup.c:2103: Boot pages released
WARNING: init(1M) exited on fatal signal 9: restarting automatically
WARNING: init(1M) exited on fatal signal 9: restarting automatically
WARNING: init(1M) exited on fatal signal 9: restarting automatically
WARNING: init(1M) exited on fatal signal 9: restarting automatically
WARNING: init(1M) exited on fatal signal 9: restarting automatically
and so on forever..

there are obviously more issues with ZFS root and Xen to explore.
Work Around
N/A
Comments
N/A