|
Description
|
The routine reboot_unode() in src/unode/unode_util.cc has the following code:
// Write in the extra argument (sleep value);
// XXX - This sleep value should be large enough
// to allow the node to be found unknown and then
// down!
//
// The reason for this is because of the quorum
// tests, some of which need the node to stay away
// from the cluster.
//
argv[argc] = new char[3];
os::sprintf(argv[argc], "30");
This sleep value should be made configurable, e.g. via an environment variable. Furthermore, this value shouldn't be a hard-coded string (it took me awhile to find this value); instead use a const value or a #define'd value defined in some header file.
Better yet, if it's possible, a more deterministic approach for waiting for a node to be declared unknown and down should be used instead of sleep, e.g via signals or other synchronization method. This has a couple of advantages. First, it's more deterministic, especially when a node reboot takes longer than 30 seconds (e.g. because the system is so overloaded). Second, the reboot itself may be fast; sleeping for 30 seconds for every reboot slows down tests unnecessarily. In other words, by using more deterministic approach, tests may run faster. Also, fast reboot might uncover timing/synchronization bugs in CMM and the ORB.
|