|
Description
|
When "6541014 ldmd fails to obtain the PRI via libpri and aborts" was fixed I think it introduced a race condition for accessing a new global variable pri_fd. As an observer it seemed like the motive for the change was to avoid pri_get() rapidly open()ing and close()ing the device.
When I ran FMD under mdb and observed its execution thru pri_init(), I saw subsequent threads read a non -1 pri_fd value and consequently fail. My understanding of Solaris threads and shared objects is that given a piece of global data in the shared object, multiple threads within a single process will share a common instance of the global variable, and separate processes will each have a private instance of the global variable.
During an mdb session it looked like ldom_fmri_retire() was called with a NULL lhp by the cpumem-retire agent, which implies the agent's global var cma_lhp was NULL, which could only be the case if ldom_init() returned NULL, which will happen if pri_init() fails or *allocp() fails. pri_init() now fails if it cannot open() the device node or if it's being called after a previous successful pri_init() without an intervening pri_fini(), ie, pri_fd had already been set, which was indeed the case (its value was 0x17).
Per discussion with Ash Saulsbury, this bug previously listed as:
libpri is not thread safe and mishandles pri_fd
is being recast as:
libldom should init libpri just once vs many times
The intended usage of libpri is that within a given process, there should only be a single pri_init() call at startup and a single pri_fini() call before exiting. FMD is the only known multithreaded process using libpri; and all of those threads go thru libldom to access the PRI (as a MD). libldom is to be changed so that pri_init() and pri_fini() are called only once for the life of the process.
Why should libldom be changed to enforce a libpri deficiency? libpri should be changed to recognize when it's being called multiple times in the same process, and ignore all but the first, otherwise, other libpri consumers will need to same sorts of workarounds.
|