Background:
===========
Starting with Solaris 9, the linker is able to load multiple object files with the same name but located in different directories according to the search path used when looking for the given name. e.g.: Looking for libA.so as a dependency of an application will use a search path that can be different from the one used to find another libA.so as a dependency of a library used by this application (the library can have its own RPATH for example).
See http://blogs.sun.com/rie/entry/loading_multiple_files_same_name for more details.
Issue:
======
The problem we are seeing is that now, every time an object is requested (even if it has already been loaded), a stat() call is issued on all the locations of the search path ahead of the one where the object has been found -- To find if another more appropriate version exists.
--> This can imply a very high performance impact when the search path (LD_LIBRARY_PATH for example) uses a lot of NFS located directories. Every stat() call will generate an NFS request on the network that takes some time, and with a configuration where most libraries are deliberately not found in the first directories, the cumulative time of stat() calls can be very high compared to the Solaris 8 runtime linker case where this feature doesn't exist.
A production example (arguable configuration but still a real life example) sees:
- An LD_LIBRARY_PATH pointing to 18 different directories (15 of them are NFS located),
- About 250 libraries loaded,
- 80% of the objects are located further than the 13th directory in the LD_LIBRARY_PATH.
On Solaris 8:
- program initializes in about 12 seconds
- around 300 stat calls (initialization phase only - until the first getpid() call)
- fully running (dlopen'ing additional libraries) in about 25 seconds
On Solaris 10: (same NFS servers providing the same binaries, application libraries etc...)
- program initializes in about 50 seconds
- more than 20000 stat() calls (initialization phase only)
- fully running (dlopen'ing additional libraries) takes more than 150 seconds
Request For Enhancement:
========================
As described in the Workaround section, crle can be used to avoid this issue, but it has some other drawbacks that didn't exist in the Solaris 8 case:
- the need to create a pre-computed index
- the need to update this index every time a new library is installed in
one of the first directories of the LD_LIBRARY_PATH
(or any other directory if the Multiple Files loading functionality should pick it up)
So we would like to have a way to disable this Multiple Files loading functionality (and the stat() calls that go with it).
Maybe an LD_ flag could be used to revert to Solaris 8 behaviour wrt the Multiple Basename Files loading. Something like an LD_NOMULTIBASENAME ?
Work Around
As described in the crle(1) man page, the Directory Cache for ELF Objects works around this issue:
In our case, for each of the NFS needed path entries of the LD_LIBRARY_PATH, we can use the following command to create a specific config file.
crle -i /net/.../orbix621/shlib -i /net/.../orbix621/shlib/default -i /net/.../lib/java/j2sdk1_3_1_06/jre/lib/sparc/server ... -c ./config.client
The resulting config file is a data file that can now be used when starting the binary.
--> The LD_LIBRARY_PATH still has to be set as before, but the LD_CONFIG=config.client has to be given as well.
LD_CONFIG=config.client; export LD_CONFIG
LD_LIBRARY_PATH=....; export LD_LIBRARY_PATH
/net/.../bin/<my_program>