OpenSolaris

Printable Version Enter a New Search
Bug ID 6789870
Synopsis ipif6_dup_recovery() may operate on a freed ipif, corrupting memory
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:tcp-ip
Keywords
Responsible Engineer Peter Memishian
Reported Against
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_107
Fixed In snv_107
Release Fixed solaris_nevada(snv_107)
Related Bugs 4728609
Submit Date 3-January-2009
Last Update Date 28-January-2009
Description
While running Seb's Clearview IPMP stress tests, every now and then I've 
hit this panic: 
 
   kernel memory allocator:  
   buffer modified after being freed 
   modification occurred at offset 0xe0 (0xdeadbeefdeadbeef replaced by 0x0) 
   buffer=ffffff0215de7540  bufctl=ffffff0213703490  cache: kmem_alloc_320 
   previous transaction on buffer ffffff0215de7540: 
   thread=ffffff022aaca580  time=T-3.414914219  slab=ffffff0215dc60e8  cache: 
   kmem_alloc_320 
   kmem_cache_free_debug+12f 
   kmem_cache_free+98 
   kmem_free+1f7 
   segvn_free+a2 
   seg_free+3c 
   segvn_unmap+ec1 
   as_free+117 
   relvm+116 
   proc_exit+4b0 
   exit+15 
 
Digging through the kmem logs reveals that this buffer had previously been 
an ipif_t, and that offset 0xe0 corresponds to the ipif_recovery_id field 
in the ipif_t.  There are a number of places where ipif_recovery_id is 
zeroed, so I instrumented the kernel and found the one specifically 
causing the corruption is at the top of ipif6_dup_recovery(): 

  static void 
  ipif6_dup_recovery(void *arg) 
  { 
        ipif_t *ipif = arg; 
 
-->     ipif->ipif_recovery_id = 0; 
        if (!(ipif->ipif_flags & IPIF_DUPLICATE)) 
                return; 
 
This is happening because we end up scheduling a recovery timer when one 
is already running, via this code in ip_ndp_excl(): 
 
        if (!(ipif->ipif_flags & (IPIF_DHCPRUNNING|IPIF_TEMPORARY)) && 
            ill->ill_net_type == IRE_IF_RESOLVER && 
            !(ipif->ipif_state_flags & IPIF_CONDEMNED) && 
            ipst->ips_ip_dup_recovery > 0) { 
                ipif->ipif_recovery_id = timeout(ipif6_dup_recovery, 
                    ipif, MSEC_TO_TICK(ipst->ips_ip_dup_recovery)); 
        } 
 
That is, I found that ipif_recovery_id was already pointing to a live 
timeout when we set it above (through some ASSERTs I added, not shown). 
So, the original ipif_recovery_id ends up getting lost, and thus if the 
ipif is freed the timer remains, and when it finally fires some 30 seconds 
later it corrupts an unrelated 320 byte buffer -- or worse. 
 
It seems that it was "by design" that a recovery timer should not be 
running when ip_ndp_excl() was called (i.e., it's not a bug that we don't 
check "ipif_recovery_id != 0" above).  Assuming that's the case, the bug 
seems to be that for IPv6, if we go through ill_restart_dad() -> 
ndp_do_recovery() -> ip_ndp_recover(), we forget to stop the recovery 
timer.  As a result, if the ipif again becomes a duplicate, then 
ip_ndp_excl() will clobber ipif_recovery_id as previously described.  This 
doesn't happen for IPv4 since ill_restart_dad() goes through ARP and then 
comes back via ip_arp_excl() which calls ipif_resolver_up() which cancels 
the recovery timer.
Work Around
N/A
Comments
N/A