We recently upgraded our diskless server, only to find out that the
smdiskless(1M) command no longer works:
# /usr/sadm/bin/smdiskless add -- -i 10.13.22.153 -e 0:3:ba:50:bd:e9
-n ophel -x os=sparc.sun4u.Solaris_11 -x root=/export/root/ophel
-x swap=/export/swap/ophel
Authenticating as user: root
Type /? for help, pressing <enter> accepts the default denoted by [ ]
Please enter a string value for: password ::
Starting Solaris Management Console server version 2.1.0.
endpoint created: localhost/127.0.0.1:898
Solaris Management Console server is ready.
Loading Tool: com.sun.admin.osservermgr.cli.OsServerMgrCli from oversee
Login to oversee.PRC.Sun.COM as user root was successful.
Download of com.sun.admin.osservermgr.cli.OsServerMgrCli from oversee
-> EXM_RMIERROR
#
Some research reveals that this obscure error actually corresponds to the
Solaris Management Console crashing. Indeed, we find diagnostics from the
Java VM at /hs_err_pid24306.log (yes, in / itself; yikes!), which has the
offending stack trace:
C [libc.so.1+0x44818] strlen+0x18
C [libsmoss.so+0x17e50] setup_server_info+0x158
C [libsmoss.so+0xe438] smossdcadd+0x110
C [libsmoss.so+0x13cdc] smossdcadd_jni+0x74
[ ... ]
Looking at libsmoss.so, we can see that setup_server_info+0x158 doesn't
directly call strlen(), but rather calls a function called sharefs():
> setup_server_info+0x158::dis -n2
setup_server_info+0x150: call +0x28ed8 <PLT:sharefs>
setup_server_info+0x154: mov 0x1, %o5
setup_server_info+0x158: tst %o0
setup_server_info+0x15c: bne,pn %icc, +0x14 <setup_server_info+0x170>
setup_server_info+0x160: add %fp, -0x4, %o1
Looking at the libsmoss.so source, we find the call to sharefs():
if (clientroot && *clientroot) {
/*
* Invoke method for sharing client's root directory:
* Let DFSTYPE default to "nfs" and we don't want any description
*/
--> err = sharefs(NULL, workbuffer, NULL, clientroot,
NULL, 1, real_pathname, &mgmt_cntxt->log);
The sharefs() function itself *also* lives in libsmoss.so:
int
sharefs(char *dfstype, char *options,
char *description, char *pathname,
char *takeeffect, int mode,
char *real_pathname,
SM_log *log)
{
... and thus one might reasonably expect this function to be called from
the above call site. However, as part of 6371468, the sharefs() function
was moved from libshare.so.1 to libc.so.1. Since every application links
with libc.so.1, this means there's now another sharefs() afoot in the
symbol namespace. Further, because of (longstanding, but illogical,
surprising and plain downright broken) way ld.so resolves dlopen()'d
objects, the sharefs() in libc will be used instead of the one in
libsmoss.so. Since libc`sharefs() has:
int
sharefs(enum sharefs_sys_op opcode, struct share *sh)
{
uint32_t i, j;
/*
* We need to know the total size of the share
* and also the largest element size. This is to
* get enough buffer space to transfer from
* userland to kernel.
*/
-> i = (sh->sh_path ? strlen(sh->sh_path) : 0);
sh->sh_size = i;
... and libsmoss.so passes a `const char *' for its second argument, we
thus go down in a ball of flames on the marked line.
Work Around
To workaround this specific problem, compile the following as a shared
object with "/opt/SUNWspro/SS11/bin/cc -Kpic -D_REENTRANT -G sharefs.c"
#include <dlfcn.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
static int (*csharefs)();
static int (*ssharefs)();
#pragma init(sharefs_init)
void
sharefs_init(void)
{
void *ch, *sh;
ch = dlopen("/lib/libc.so.1", RTLD_NOW);
sh = dlopen("/usr/sadm/lib/wbem/libsmoss.so", RTLD_NOW);
if (ch == NULL || sh == NULL)
abort();
csharefs = (int (*)())dlsym(ch, "sharefs");
ssharefs = (int (*)())dlsym(sh, "sharefs");
if (csharefs == NULL || ssharefs == NULL)
abort();
}
int
sharefs(void *a, void *b, void *c, void *d, void *e, void *f, void *g,
void *h)
{
if (c == NULL)
return (ssharefs(a, b, c, d, e, f, g, h));
return (csharefs(a, b));
}
Then make it system-wide preload:
# crle -e LD_PRELOAD=/path/to/sharefs.so
The above workaround is quite crude and fragile; it relies on the fact
that libsmoss always calls sharefs() with a third argument of NULL.
However, since libc`sharefs() only takes two arguments, it's possible that
the third argument may still end up being NULL. YMMV.
A slightly more robust workaround that figures out which sharefs is
meant based on the caller's frame. Still lame, but not dependent
on stack junk.
#pragma ident "%Z%%M% %I% %E% SMI"
#include <dlfcn.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <ucontext.h>
static int (*csharefs)();
static int (*ssharefs)();
#pragma init(sharefs_init)
static void
sharefs_init(void)
{
void *ch, *sh;
ch = dlopen("/lib/libc.so.1", RTLD_NOW);
sh = dlopen("/usr/sadm/lib/wbem/libsmoss.so", RTLD_NOW);
if (ch == NULL || sh == NULL)
abort();
csharefs = (int (*)())dlsym(ch, "sharefs");
ssharefs = (int (*)())dlsym(sh, "sharefs");
if (csharefs == NULL || ssharefs == NULL)
abort();
}
/* ARGSUSED */
static int
gather_smoss(uintptr_t pc, int sig, void *arg)
{
Dl_info dli;
boolean_t *smossp = arg;
*smossp = (dladdr((void *)pc, &dli) != 0 &&
strstr(dli.dli_fname, "libsmoss.so") != NULL);
return (1);
}
int
sharefs(void *a, void *b, void *c, void *d, void *e, void *f, void *g,
void *h)
{
ucontext_t uc;
boolean_t smoss;
(void) getcontext(&uc);
(void) walkcontext(&uc, gather_smoss, &smoss);
if (smoss)
return (ssharefs(a, b, c, d, e, f, g, h));
return (csharefs(a, b));
}