OpenSolaris

Printable Version Enter a New Search
Bug ID 6651864
Synopsis asynchronous DLPI processing can induce panic in ill_move_to_new_ipsq()
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords
Responsible Engineer Peter Memishian
Reported Against fw_26
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_92
Fixed In snv_92
Release Fixed solaris_nevada(snv_92)
Related Bugs 4956997 , 6516277 , 6651866 , 6693562 , 6709252 , 6738235 , 6799776
Submit Date 17-January-2008
Last Update Date 17-July-2008
Description
PIT reported the following panic while running the IPMP test suite:

   assertion failed: old_ipsq->ipsq_mphead == 0 && old_ipsq->ipsq_mptail
   == 0, file : ../../common/inet/ip/ip_if.c, line: 14276

While they hit this testing Clearview UV project bits, the problem exists
in Nevada as well.  Specifically, the failed assertion is because a DLPI
message has been enqueued onto ipsq_mphead while we were in the middle of
processing a SIOSLIFGROUPNAME request:

  ill_move_to_new_ipsq+0xc8(cc864500, cc864800, ca30a0c0, cfd639c8)
  ill_merge_groups+0x11e(d200c054, d1eefdd4, 0, ca30a0c0, cfd639c8)
  ip_sioctl_move+0x183(cb0cb6a4, ccffe9a8, cfd639c8, ca30a0c0, ...
  ip_process_ioctl+0x220(cc864500, cfd639c8, ca30a0c0, fecc8ee0)
  ip_wput_nondata+0x113e(0, cfd639c8, ca30a0c0, 0)
  ip_output_options+0x36d(ce70e040, ca30a0c0, cfd639c8, 2, fecde898)
  ip_output+0x1c(ce70e040, ca30a0c0, cfd639c8, 2)
  udp_wput_iocdata+0x316(cfd639c8, ca30a0c0)
  udp_wput_other+0x3d7(cfd639c8, ca30a0c0)
  udp_wput+0x318(cfd639c8, ca30a0c0)
  putnext+0x2a5(ce10b140, ca30a0c0)
  strdoioctl+0x4c4(ce787078, cdde6ca0, 100003, 1, cbfa9a40, cdde6f70)
  strioctl+0x3830(cf722b80, 80786999, 8044d28, 100003, 1, cbfa9a40)
  socktpi_ioctl+0x392(cf722b80, 80786999, 8044d28, 100003, cbfa9a40, ...
  fop_ioctl+0x49(cf722b80, 80786999, 8044d28, 100003, cbfa9a40, ...
  ioctl+0x155()
  sys_sysenter+0x1a4()

  > cc864500::print ipsq_t ipsq_mphead->b_rptr | ::print dl_ok_ack_t
  {
       dl_primitive = 0x6
       dl_correct_primitive = 0x1d
  }

This is a DL_OK_ACK (0x6) for a DL_ENABMULTI_REQ (0x1d) previously sent by
IP.  Examining the ill_t, we see in fact that there is a DL_ENABMULTI_REQ
pending, and another waiting to be sent (queued in ill_dlpi_deferred):

  > cc864500::print ipsq_t ipsq_phyint_list->phyint_illv4->ill_dlpi_pending
  ipsq_phyint_list->phyint_illv4->ill_dlpi_pending = 0x1d
  > cc864500::print ipsq_t ipsq_phyint_list->phyint_illv4->ill_dlpi_deferred
  ipsq_phyint_list->phyint_illv4->ill_dlpi_deferred = 0xce163f00
  > 0xce163f00::print mblk_t b_rptr | ::print dl_enabmulti_req_t
  {
       dl_primitive = 0x1d
       dl_addr_length = 0x6
       dl_addr_offset = 0xc
  }

These messages aren't surprising since IP normally sends DL_ENABMULTI_REQs
when bringing an interface up via ipif_up_done() -> ill_recover_multicast()
-> ip_ll_addmulti_v6() -> ip_ll_send_enabmulti_req().  However, unlike
most DLPI messages, IP does not wait for the ACKs for DL_ENABMULTI_REQs
before completing the ioctl -- instead, they just complete asynchronously
and the results are recorded in ill_dlpi_multicast_state.

Because DL_ENABMULTI_REQs are handled asynchronously with respect to the
ongoing IPSQ operation, they are actually unrelated to it and thus need
to enter the IPSQ as a new operation, rather than as part of the current
operation.  However, the current code in ip_rput_dlpi() does not account
for this, and treats everything other than DL_NOTIFY_IND as part of the
current operation:

        /*
         * We know the message is one we're waiting for (or DL_NOTIFY_IND),
         * and we need to become writer to continue to process it. If it's not
         * a DL_NOTIFY_IND, we assume we're in the middle of an exclusive
         * operation and pass CUR_OP.  If this isn't true, we'll end up doing
         * some work as part of the current exclusive operation that actually
         * is not part of it -- which is wrong, but better than the
         * alternative of deadlock (if NEW_OP is always used).  Someday, we
         * should track which DLPI requests have ACKs that we wait on
         * synchronously so we can know whether to use CUR_OP or NEW_OP.
         *
         * As required by qwriter_ip(), we refhold the ill; it will refrele.
         * Since this is on the ill stream we unconditionally bump up the
         * refcount without doing ILL_CAN_LOOKUP().
         */
        ill_refhold(ill);
        if (dloa->dl_primitive == DL_NOTIFY_IND)
                qwriter_ip(ill, q, mp, ip_rput_dlpi_writer, NEW_OP, B_FALSE);
        else
                qwriter_ip(ill, q, mp, ip_rput_dlpi_writer, CUR_OP, B_FALSE);

As per the comment, the default of CUR_OP was chosen because it was viewed
as harmlessly wrong.  However, it turns out there *is* a harm: as seen
here, there are some operations that know that they do not send any DLPI
requests and thus expect that there are no DLPI replies.

The quick fix is to always treat DL_ENABMULTI_REQ (and DL_DISABMULTI_REQ)
as NEW_OP.  A more comprehensive fix would to (as the comment says)
determine which requests have ACKs that are waited for, and only wait for
those.  Since some requests may only want to be waited for in some
contexts, this would probably need to be done through an additional
argument to ill_dlpi_send().
Work Around
N/A
Comments
N/A