elf_rtbndr() does not save SSE registers defined by the AMD64 ABI to be
used for certain types of arguments. Functions which use SSE arguments
can have their SSE arguments trashed by functions called from the linker
the first time the function is invoked.
When a dynamically linked function is called code in the PLT is executed.
The first time through a PLT elf_rtbndr() assembly code is called.
elf_rtbndr() must save all registers that can possibly contain arguments
before calling any other functions such as elf_bndr(). elf_rtbndr() must
restore the saved registers before invoking the function it interposed on.
elf_rtbndr() currently pushes all general purpose registers defined by the
AMD64 ABI to possibly contain arguments. elf_rtbndr() does not save/restore
any SSE registers.
The AMD64 ABI defined floating point arguments to be passed in SSE
registers XMM0 through XMM7. Functions called from elf_bndr() are free
by the ABI to use all SSE registers as scratch registers which would
corrupt the original function's SSE arguments.
None of the user-land functions invoked from elf_bndr() currently use
SSE registers. There is no guaranty this coincidence will hold true in
the future.
Only functions which take floating point argument(s) can possibly hit
this issue.
The fix is to have elf_rtbndr() save/restore SSE argument registers
XMM0 through XMM7 along with the general purpose registers.
To reproduce:
Change libc's 64-bit strcmp to step on register xmm0 (allowed by ABI).
Build and install new 64-bit ld.so.1 with the new strcmp.
Every dynamic linked function that uses xmm0 to pass an argument will
now exhibit the failure the first time called.
elf_plt_trace() in boot_elf.s uses mov instructions to move all registers
on/off the stack instead of push/pop instruction. The fix does this.
A micro benchmark indicates movs are faster on a Core 2 Duo and an Opteron 248.