|
Description
|
This is not directly related to log devices, but is the most likely
cause. It can also happen with l2cache devices, and can be triggered
by a vareity of I/O errors during load.
In spa_load(), there are a series of failures that can happen after
spa_load_spares():
/*
* Load any hot spares for this pool.
*/
error = zap_lookup(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_SPARES, sizeof (uint64_t), 1, &spa->spa_spares.sav_object);
if (error != 0 && error != ENOENT) {
vdev_set_state(rvd, B_TRUE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_CORRUPT_DATA);
error = EIO;
goto out;
}
...
/*
* Load any level 2 ARC devices for this pool.
*/
error = zap_lookup(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
DMU_POOL_L2CACHE, sizeof (uint64_t), 1,
&spa->spa_l2cache.sav_object);
if (error != 0 && error != ENOENT) {
vdev_set_state(rvd, B_TRUE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_CORRUPT_DATA);
error = EIO;
--------------> goto out;
}
...
if (spa_check_logs(spa)) {
vdev_set_state(rvd, B_TRUE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_BAD_LOG);
error = ENXIO;
ereport = FM_EREPORT_ZFS_LOG_REPLAY;
--------------> goto out;
}
...
When this function fails, we'll unload and deactivate the spa_t:
if (error) {
/*
* We can't open the pool, but we still have useful
* information: the state of each vdev after the
* attempted vdev_open(). Return this to the user.
*/
if (config != NULL && spa->spa_root_vdev != NULL) {
spa_config_enter(spa, RW_READER, FTAG);
*config = spa_config_generate(spa, NULL, -1ULL,
B_TRUE);
spa_config_exit(spa, FTAG);
}
spa_unload(spa);
spa_deactivate(spa);
spa->spa_last_open_failed = B_TRUE;
...
The problem comes from the fact that in spa_unload(), we
free the spares, but don't reset the number of spares to zero:
for (i = 0; i < spa->spa_spares.sav_count; i++)
vdev_free(spa->spa_spares.sav_vdevs[i]);
if (spa->spa_spares.sav_vdevs) {
kmem_free(spa->spa_spares.sav_vdevs,
spa->spa_spares.sav_count * sizeof (void *));
spa->spa_spares.sav_vdevs = NULL;
}
In this case, 'sav_count' will still be set to the number of loaded
spares, but 'sav_vdevs' will be NULL. Next time we come through
spa_load(), we'll go through the mosconfig path, before loading any
spares:
if (!mosconfig) {
...
spa_config_set(spa, newconfig);
--------------> spa_unload(spa);
spa_deactivate(spa);
spa_activate(spa);
return (spa_load(spa, newconfig, state, B_TRUE));
}
In our second trip through spa_unload(), we'll notice the non-zero
spare count and attempt to execute the same bit of code:
for (i = 0; i < spa->spa_spares.sav_count; i++)
vdev_free(spa->spa_spares.sav_vdevs[i]);
But at this point, 'sav_vdevs' is still NULL, and we'll panic
dereferencing a NULL pointer. The solution is to zero out
'sav_count' as part of spa_unload() to bring it back to a
pristene state.
|