|
Description
|
While doing forwarding tests for Clearview IPMP, it seems I've stumbled on
another Surya/IPMP bug that also exists in Nevada. This is a bit of a
corner-case, but I suspect a customer will hit it. Here's the scenario:
1. Suppose we're configured to forward out an IPMP group with two IP
interfaces, and suppose both IP interfaces have failed.
2. Suppose we're now asked to forward a packet that would go out
through those interfaces. As part of building ARP_REQ_MBLK to
ask ARP to resolve the IP address of the next-hop,
ire_arpresolve() records the ire_stq_ifindex:
*ire = ire_null;
ire->ire_u = in_ire->ire_u;
ire->ire_ipif_seqid = in_ire->ire_ipif_seqid;
ire->ire_ipif_ifindex = in_ire->ire_ipif_ifindex;
ire->ire_ipif = in_ire->ire_ipif;
ire->ire_stq = in_ire->ire_stq;
--> ill = ire_to_ill(ire);
--> ire->ire_stq_ifindex = ill->ill_phyint->phyint_ifindex;
ire->ire_zoneid = in_ire->ire_zoneid;
ire->ire_stackid = ipst->ips_netstack->netstack_stackid;
ire->ire_ipst = ipst;
However, the code above has a subtle bug: ire_type is 0 when we
call ire_to_ill(). That means ire_to_ill() will return
ire_ipif->ipif_ill, which is not necessarily the destination ill.
We then record this incorrect index in ire_stq_ifindex.
3. The ARP resolution fails (because both IP interfaces are down).
So we end up in ire_freemblk() which does:
ill = ill_lookup_on_ifindex(ire_mp->ire_stq_ifindex,
B_FALSE, NULL, NULL, NULL, NULL, ipst);
if (ill == NULL || (ire_mp->ire_stq != ill->ill_wq) ||
(ill->ill_state_flags & ILL_CONDEMNED)) {
/*
* ill went away. no nce to clean up.
* Note that the ill_state_flags could be set to
* ILL_CONDEMNED after this point, but if we know
* that it is CONDEMNED now, we just bail out quickly.
*/
if (ill != NULL)
ill_refrele(ill);
goto cleanup;
}
Since ill_lookup_on_ifindex() may return the other ill in the group,
we may fail the ire_mp->ire_stq != ill->ill_wq test. Thus we assume
the ill has gone away, and we leave the incomplete IRE rotting away.
4. One of the IP interfaces comes back up. However, we're still unable
to forward packets to the destination(s) associated with the rotting
incomplete IREs because IP thinks ARP is working on resolving them.
|