when deleting a very large file, we won't be able to
cache all the level-1 indirect blocks from when we
prefetch them in dmu_tx_count_free() until when we
process the free in syncing context. So, free_children()
should kick off multiple i/os in parallel (say, all of
the l1's under each l2).