OpenSolaris

Printable Version Enter a New Search
Bug ID 6749498
Synopsis snapshot -r can fail on systems that have been image-updated with unplayed zils
State 11-Closed:Duplicate (Closed)
Category:Subcategory kernel:zfs
Keywords
Reported Against
Duplicate Of 6462803
Introduced In
Commit to Fix
Fixed In
Release Fixed
Related Bugs
Submit Date 17-September-2008
Last Update Date 6-October-2008
Description
We were seeing that we weren't able to take recursive snapshots of an OpenSolaris 2008.11
(nv_97 based) system. The symptom was, that we couldn't do, say:

root@haiiro[96] zfs snapshot -r rpool@snap
cannot create snapshot 'rpool/ROOT/opensolaris-2@snap': dataset is busy

This system had been pkg image-updated a few times, and  beadm showed 
the following:

timf@haiiro[442] beadm list
BE            Active Mountpoint Space  Policy Created          
--            ------ ---------- -----  ------ -------          
opensolaris-1 -      -          83.02M static 2008-08-21 17:45 
opensolaris-2 -      -          50.34M static 2008-09-09 10:18 
opensolaris-3 NR     /          10.01G static 2008-09-11 15:43 

 - the bootable ZFS datasets were:

timf@haiiro[444] zfs list -t filesystem -r rpool/ROOT
NAME                           USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT                    8.39G  26.1G    18K  legacy
rpool/ROOT/opensolaris-1      68.3M  26.1G  3.58G  legacy
rpool/ROOT/opensolaris-1/opt   132K  26.1G   448M  /opt
rpool/ROOT/opensolaris-2      49.8M  26.1G  3.93G  legacy
rpool/ROOT/opensolaris-2/opt   536K  26.1G   448M  /opt
rpool/ROOT/opensolaris-3      8.27G  26.1G  4.45G  legacy
rpool/ROOT/opensolaris-3/opt  1.01G  26.1G  1.01G  /opt

In this case, rpool/ROOT/opensolaris-2 was an older bootable dataset, and wasn't
mounted at the time of trying to take the snapshot.

Here's my theory, open to debate:

I dug about a bit looking for EBUSY in the zfs_ioc_snapshot codepath,
and found that dmu_objset_snapshot_one() was calling zil_suspend(),
which was returning an EBUSY.

The filesystems that were failing all had zdb output similar to:

root@haiiro[93] zdb -ivv rpool/ROOT/opensolaris-2
Dataset rpool/ROOT/opensolaris-2 [ZPL], ID 113, cr_txg 494704, 3.93G,
176332 objects

    ZIL header: claim_txg 501893, seq 0

filesystems where we could take snapshots didn't show this ZIL header.

Mounting and unmounting the dataset caused this problem to go away, we
were able to snapshot these filesystems after that - running the zdb -ivv
command on the dataset then showed no ZIL header.

These all had legacy mountpoints and were unmounted at the time. That
they were also boot environments, means that they were probably shutdown
or rebooted after doing an upgrade, but since they're not likely to get
booted again, the zil will remain on disk, perpetually unreplayed, so
we'd never get snapshots of that dataset until the filesystem is mounted
and unmounted.

Reproducing this could be tricky - it could be that the machine didn't
cleanly shutdown after the image-update (I believe I used reboot(1M)
to reboot the machine, but that should issue a sync(1M) which I'd thought
should flush the zil as well?)
Work Around
N/A
Comments
N/A