OpenSolaris

Printable Version Enter a New Search
Bug ID 6722540
Synopsis 50% slowdown on scrub/resilver with certain vdev configurations
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:zfs
Keywords zfs-perf
Responsible Engineer Jeff Bonwick
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_105
Fixed In snv_105
Release Fixed solaris_nevada(snv_105) , solaris_10u8(s10u8_01) (Bug ID:2176242)
Related Bugs 6759999
Submit Date 3-July-2008
Last Update Date 22-April-2009
Description
With the following vdev configuration:

mirror [DTL: 4-now]
  diskA  offline [DTL: offlinetime-now]
  replacing [DTL: 4-replacetime]
    diskB  old [DTL: none]
    diskC  new  [DTL: 4-replacetime]

it does reads from B, writes to C, everything seems OK.

But if then you then online A, it tries to read from A, and write to B & 
C.  Of course the data on diskA is not correct, so it gets checksum 
errors and retries the reads from diskB.  The end result is we read 
everything from A and B, and write everything to A and C.  Since A is 
being hit with 2x the I/O (everything is read and then written), this 
causes a 50% slowdown.  But the behavior should still be correct.

The root cause is likely that both mirror children (A and replacing)
have DTLs that cover a large range, we consider them equal and just
"happen" to try A first.  However, we know that the replacing vdev has 
the correct data (on B), so we should just read from there.
Work Around
N/A
Comments
N/A