|
Description
|
While debugging an unrelated problem, I noticed that we were seeing
checksum errors, but 'fmdump -e' wasn't showing any related
ereports. After some dtracing, I found that zfs_ereport_post() is
correctly being called, but that we're erroneously ignoring the
errors. In particular, zfs_ereport_post() has the following logic:
/*
* Ignore any errors from I/Os that we are going to retry anyway - we
* only generate errors from the final failure.
*/
if (zio && zio_should_retry(zio))
return;
For checksum errors, we generate the ereport is zio_checksum_verify(),
which occurs _after_ the zio_io_assess() stage that normally issues the
retry. Assuming that this is the intended behavior (to not retry
checksum errors), then zfs_ereport_post() is making an invalid assumption
that the given io will be retried later.
Note that this only affects unreplicated pools. Otherwise, the checksum
errors will appear at the leaf vdev, and the 'vd != vdev_top' check in
zio_should_retry() let us through zfs_ereport_post().
|