If you take a snapshot of a filesystem than back it up using tar, cpio, pax or anyother archiver so that all the files have been read and then destroy that file system. The destroy operation takes unreasaobaly long time to complete. During that time one CPU is pegged.
Here tank/test contains a root file system:
v4u-880k-gmp03 516 # zfs snapshot tank/test@6
v4u-880k-gmp03 517 # time tar cf /dev/null /tank/test/.zfs/snapshot/6
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis/sparcv9/libclib_jiio.so: symbolic link too long
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis2/sparcv9/libclib_jiio.so: symbolic link too long
real 3m43.81s
user 0m12.74s
sys 1m24.24s
v4u-880k-gmp03 518 # time zfs destroy tank/test@6
real 1h6m44.70s
user 0m0.02s
sys 1h6m44.10s
v4u-880k-gmp03 519 #
Tracing this with dtrace shows all the time being spent in this loop:
for (zp = list_head(&zfsvfs->z_all_znodes); zp; zp = nextzp) {
nextzp = list_next(&zfsvfs->z_all_znodes, zp);
if (zp->z_dbuf_held) {
/* dbufs should only be held when force unmounting */
zp->z_dbuf_held = 0;
mutex_exit(&zfsvfs->z_znodes_lock);
dmu_buf_rele(zp->z_dbuf, NULL);
/* Start again */
mutex_enter(&zfsvfs->z_znodes_lock);
nextzp = list_head(&zfsvfs->z_all_znodes);
}
}
The list contains about 300,000 entries and each one has z_dbuf_held set. Hence this loop is iterated about 300,000*(300,000/2) times.
You don't actually have to destroy the file snapshot to reprodce this. Doing
umount -f /tank/test/.zfs/snapshot/6
has the same issue.
You don't see the problem on snapshots that have not been accessed or on file systems. Indeed even if a file system has a mounted snapshot that has been accessed, which would be slow to unmount, unmointing the file system (tank/test in this case) which implies an unmount of /tank/test/.zfs/snapshot/6 is fast:
v4u-880k-gmp03 524 # zfs snapshot tank/test@6
v4u-880k-gmp03 525 # time tar cf /dev/null /tank/test/.zfs/snapshot/6
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis/sparcv9/libclib_jiio.so: symbolic link too long
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis2/sparcv9/libclib_jiio.so: symbolic link too long
real 3m19.31s
user 0m12.59s
sys 0m56.69s
v4u-880k-gmp03 526 # pwd
/
v4u-880k-gmp03 527 # time umount /tank/test
real 0m2.90s
user 0m0.01s
sys 0m2.88s
v4u-880k-gmp03 528 #
Work Around
cd into the mountpoint of the filesystem for which this snapshot is being deleted and then attempt to unmount the file system. The unmount will fail as the file system is busy but the subsequent unmount or destroy of
v4u-880k-gmp03 519 # zfs snapshot tank/test@6
v4u-880k-gmp03 520 # time tar cf /dev/null /tank/test/.zfs/snapshot/6
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis/sparcv9/libclib_jiio.so: symbolic link too long
tar: /tank/test/.zfs/snapshot/6/usr/jdk/instances/jdk1.5.0/jre/lib/sparc/cpu/sparcv9+vis2/sparcv9/libclib_jiio.so: symbolic link too long
real 3m18.98s
user 0m12.62s
sys 0m56.57s
v4u-880k-gmp03 521 # (cd $(zfs list -H -o mountpoint tank/test) && umount $(/bin/pwd) )
cannot unmount '/tank/test': Device busy
v4u-880k-gmp03 522 # time zfs destroy tank/test@6
real 0m0.18s
user 0m0.01s
sys 0m0.02s
v4u-880k-gmp03 523 #
From the customer using the work-around:
"Yes this works, but you might want to document the work-around to
indicate that if the filesystem is shared, the umount seems to
make it unshared and that a "zfs share filesystem" command need to
be executed. I found out the hard way."
From CR 6537472
Use this script to unmount
#!/bin/ksh -p
zfs unmount $1 || [[ $(zfs get -Ho value sharenfs $1) == "off" ]] || zfs share $1
Workaround (same as above, but little refined)
----------
Before you do the 'zfs destroy <snapshot>' operation, do below
mentioned steps:
1) 'cd' to the mountpoint of the filesystem.
2) 'unmount' the filesystem. This will fail as "Device busy". Ignore
the error message.
For example, assume you have a zfs file system 'foo' in zpool 'tank'
and a snapshot 'weekly'.
# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
tank/foo@weekly 22.0M - 79.0M -
#
To destroy the above snapshot 'tank/foo@weekly' do like this:
# pwd
/
# cd /tank/foo
# umount /tank/foo
cannot unmount '/tank/foo': Device busy
# cd -
# pwd
/
# zfs destroy tank/foo@weekly
#
Measure the time taken for 'zfs destroy' and compare with the results
without this workaround. And let me also know whether is there any
improvement.