If the write is greater than zfs_immediate_write_sz then
later *if* we need to log the write then dmu_sync() is used
to immediately write the block and it's block pointer is put
in the log record. Currently zfs_immediate_write_sz is 32KB.
See zfs_log_write/itx_wr_state/WR_INDIRECT.
Now for slogs we do not necessarily wan to do this performance
optimisation. If the log device is much faster (eg nvram) then we
do not want wait for a slow disk to complete the transaction.
For example, ds7 is a simple program that writes buffers of a specified
size using a specified number of threads to fill up a file of a specified
size.
Using nvram as the log device:
: trasimene ; time ~/junk/ds7.`uname -p` -v -n /whirl/a -f200000000 -b32768 -t1
randomly writing 6103 x 32768 bytes (= 200000000 bytes) to /whirl/a
using 1 threads and O_DSYNC
MB/s: 158.76
real 0m1.22s
user 0m0.00s
sys 0m0.59s
: trasimene ; time ~/junk/ds7.`uname -p` -v -n /whirl/a -f200000000 -b65536 -t1
randomly writing 3051 x 65536 bytes (= 200000000 bytes) to /whirl/a
using 1 threads and O_DSYNC
MB/s: 5.42
real 0m35.18s
user 0m0.00s
sys 0m0.41s
: trasimene ;
So we have a 28x speed up!
The original prototype had a different set of vectors for chained logs (clogs)
and separate logs (slogs) and did not use dmu_sync() for slogs.
However, it does need to split the write because the largest log block
(128KB) can't contain a log record and 128KB of user data. Something similar
will be needed. There will also need to be a method to determine whether
a slog exists because at present the ZIL code doesn't need to know.