|
Description
|
Currently, ZFS does not do any diagnosis of drives which are
pathologically broken (i.e. continually returning I/O or
checksum errors). This results in a particularly bad exprerience
when a device goes out to lunch, because ZFS will continue to
issue I/O even though it never comes back. This brings the
entire pool to a halt, and the user sometimes cannot even make
forward progress to determine what has gone wrong.
The ZFS diagnosis engine needs to listen to I/O and checksum
ereports and make an intelligent diagnosis. Note that this
will necessitate a new vdev state, VDEV_STATE_FAULTED, which
indicates an external request to fault the device. This state
will be persistent, and there will have to be some careful
negotiation between fmd's resource cache and ZFS over repair
actions.
|