Currently ssh does not benefit from hardware crypto acceleration on Niagara platform because it can not use PKCS#11 engine. See PSARC/2004/681 for details and history.
Fixing ssh would make the Niagara platform more attractive to customers.
Note that the symmetric key crypto benefit here will be primarily
for Niagara _2_ platforms, not Niagara 1. Making ssh/sshd use
Niagara 1's assymetric key crypto acceleration is quite different
in terms of code changes from what is needed to make ssh/sshd use
Niagara 2's symmetric key crypto capabilities, so there should be
two separate CRs to track this work.
SunSSH did use the OpenSSL PKCS#11 engine for a while during S10
development.
Two issues conspired to make us pull it out:
a) OpenSSH was using knowledge of OpenSSL EVP internals; this broke
spectacularly when using the RC4 cipher
b) the OpenSSL PKCS#11 engine was not fork-safe, but ssh and sshd
need to fork at several points after having started use of crypto.
I don't recall the details of (a), but the RC4 part may now be a non-
issue since IIRC it had to do with the OpenSSH privsep code, which we
don't use at all now.
(b) can be addressed as follows. In ssh force a re-key when it comes
time to fork, but if the protocol is SSHv1 or the server is old enough
that it doesn't support re-keying, then don't use the OpenSSL PKCS#11
engine. The same approach will work in sshd; alternatively sshd could
pre-fork the monitor's child and do all the crypto there, with the
parent becoming the real monitor when authentication is done.
Also, maybe it's time to remove the OpenSSH privsep monitor code
from the SunSSH code base. Though we should leave uses of the
PRIVSEP() macro to make resyncs easier.
Work Around
one way to achieve more speed on machines like Niagara (or any multi CPU boxes) is to run the transfer in parallel, to make use of more (virtual) CPUs. See this blog entry for more information:
http://blogs.sun.com/janp/entry/speeding_up_ssh_data_transfer
also, some speed can be gained by using RC4. From my experiments, transferring 500MB of data over localhost on T5220 takes ~53 seconds using the default aes128-ctr cipher mode while it takes ~34 seconds using arcfour (RC4 is really fast in software).
to use RC4, run ssh like this: "ssh -o Ciphers=arcfour ...."
this is almost 40% time off from the default aes128-ctr case.