OpenSolaris

Printable Version Enter a New Search
Bug ID 6775811
Synopsis NCEs can get stuck in ND_INCOMPLETE if ARP fails when IPMP is in-use
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords
Responsible Engineer Peter Memishian
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_107
Fixed In snv_107
Release Fixed solaris_nevada(snv_107)
Related Bugs 6385609 , 6782947 , 6783149
Submit Date 24-November-2008
Last Update Date 28-January-2009
Description
While doing forwarding tests for Clearview IPMP, it seems I've stumbled on
another Surya/IPMP bug that also exists in Nevada.  This is a bit of a
corner-case, but I suspect a customer will hit it.  Here's the scenario:

   1. Suppose we're configured to forward out an IPMP group with two IP
      interfaces, and suppose both IP interfaces have failed.

   2. Suppose we're now asked to forward a packet that would go out
      through those interfaces.  As part of building ARP_REQ_MBLK to
      ask ARP to resolve the IP address of the next-hop,
      ire_arpresolve() records the ire_stq_ifindex:

        *ire = ire_null;
        ire->ire_u = in_ire->ire_u;
        ire->ire_ipif_seqid = in_ire->ire_ipif_seqid;
        ire->ire_ipif_ifindex = in_ire->ire_ipif_ifindex;
        ire->ire_ipif = in_ire->ire_ipif;
        ire->ire_stq = in_ire->ire_stq;
-->     ill = ire_to_ill(ire);
-->     ire->ire_stq_ifindex = ill->ill_phyint->phyint_ifindex;
        ire->ire_zoneid = in_ire->ire_zoneid;
        ire->ire_stackid = ipst->ips_netstack->netstack_stackid;
        ire->ire_ipst = ipst;

      However, the code above has a subtle bug: ire_type is 0 when we
      call ire_to_ill().  That means ire_to_ill() will return
      ire_ipif->ipif_ill, which is not necessarily the destination ill.
      We then record this incorrect index in ire_stq_ifindex.

   3. The ARP resolution fails (because both IP interfaces are down).
      So we end up in ire_freemblk() which does:

        ill = ill_lookup_on_ifindex(ire_mp->ire_stq_ifindex,
            B_FALSE, NULL, NULL, NULL, NULL, ipst);
        if (ill == NULL || (ire_mp->ire_stq != ill->ill_wq) ||
            (ill->ill_state_flags & ILL_CONDEMNED)) {
                /*
                 * ill went away. no nce to clean up.
                 * Note that the ill_state_flags could be set to
                 * ILL_CONDEMNED after this point, but if we know
                 * that it is CONDEMNED now, we just bail out quickly.
                 */
                if (ill != NULL)
                        ill_refrele(ill);
                goto cleanup;
        }

      Since ill_lookup_on_ifindex() may return the other ill in the group,
      we may fail the ire_mp->ire_stq != ill->ill_wq test.  Thus we assume
      the ill has gone away, and we leave the incomplete IRE rotting away.

   4. One of the IP interfaces comes back up.  However, we're still unable
      to forward packets to the destination(s) associated with the rotting
      incomplete IREs because IP thinks ARP is working on resolving them.
Work Around
N/A
Comments
N/A