|
Description
|
The kernel bzero, uzero, etc routines currently do not use sse or sse2
non-temporal instructions. The only exception is hwblkclear, which
is called only for single pages.
For sse2 enabled cpus, it is not difficult to use either the movntq
or movnti instructions to implement a bzero that doesn't allocate in
the cache. Use of the movntq instruction is a bit more problematic,
in that care must be take to properly save and restore possible fp
register state. For movnti, this is not an issue, but this instruction
is less efficient as it moves only 4 bytes at a time. A userland
check of time needed to zero a 1M area NOT IN CACHE yields:
~680 usecs - existing rep, sstol loop.
~383 usecs - movtni loop, unrolled to 64 bytes 44% savings
~322 usecs - movntq loop, unrolled to 64 bytes 53% savings
It seems worthwhile to investigate the use of these instructions to
avoid clearing the cache unecessarily. The usual problems of deciding
when to use the non-allocating instructions present themselves, since
these instructions run slower than the allocating ones if the data
is already in the cache.
|