OpenSolaris

Printable Version Enter a New Search
Bug ID 6573659
Synopsis removing a USDT provider can undermine pid probes
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:dtrace
Keywords
Responsible Engineer Adam Leventhal
Reported Against
Duplicate Of
Introduced In
Commit to Fix snv_72
Fixed In snv_72
Release Fixed solaris_nevada(snv_72)
Related Bugs 6319069
Submit Date 25-June-2007
Last Update Date 31-August-2007
Description
[ahl 7.25.2007]

An application traced with the pid provider crashed on an apparent unhandled breakpoint trap. The pid provider, of course, uses the breakpoint instruction on x86 as for its instrumentation, but since the application was actively traced it should have caught and handled the trap.

After some investigation I determined that this loop was failing to find the appropriate tracepoint (in fasttrap_pid_probe()):

        /*
         * Lookup the tracepoint that the process just hit.
         */
        for (tp = bucket->ftb_data; tp != NULL; tp = tp->ftt_next) {
                if (pid == tp->ftt_pid && pc == tp->ftt_pc &&
                    !tp->ftt_proc->ftpc_defunct)
                        break;
        }

        /*
         * If we couldn't find a matching tracepoint, either a tracepoint has
         * been inserted without using the pid<pid> ioctl interface (see
         * fasttrap_ioctl), or somehow we have mislaid this tracepoint.
         */
        if (tp == NULL) {
                mutex_exit(pid_mtx);
                return (-1);
        }

Using DTrace, I confirmed that the tracepoint existed, but it was failing the !tp->ftt_proc->ftpc_defunct check.

All providers for a given process share a common fasttrap_proc_t structure. This is necessary because providers can share tracepoints for a given instruction (fasttrap_tracepoint_t) and therefore need a common structure that represents the process image that they are tracing. Note that a fasttrap_proc_t is 'retired' when the process exits or execs; old and new process images are demarcated with different fasttrap_proc_t instances.

When a process exits or execs, proc_exit() cleans up any helper providers (USDT) or pid provider probes:

        /*
         * Clean up any DTrace helper actions or probes for the process.
         */
        if (p->p_dtrace_helpers != NULL) {
                ASSERT(dtrace_helpers_cleanup != NULL);
                (*dtrace_helpers_cleanup)();
        }
...
        /*
         * Clean up any DTrace probes associated with this process.
         */
        if (p->p_dtrace_probes) {
                ASSERT(dtrace_fasttrap_exit_ptr != NULL);
                dtrace_fasttrap_exit_ptr(p);
        }

The problem arises when a USDT provider is removed not because of an exit or exec, but because the containing module is unloaded (causing a ioctl() from the _fini() in the module informing the kernel to remove the provider). This uses the same mechanism as the exec/exit case and has the unintended effect of marking the shared fasttrap_proc_t as defunct. This action, of course, applies to all providers for the process including the pid provider. The upshot is that when we hit the probe point the lookup for the fasttrap_tracepoint_t fails because the provider has effectively been marked defunct.

Any solution will need to ensure that the single provider remove case does not mark the fasttrap_proc_t as defunct. One option would be to have fasttrap_exec_exit() have some way of explicitly marking the fasttrap_proc_t as defunct rather than relegating that task to fasttrap_provider_retire() (where that activity has been erroneously overloaded).
Work Around
N/A
Comments
N/A