OpenSolaris

Printable Version Enter a New Search
Bug ID 5070897
Synopsis kernel bzero routine should use non-temporal instructions for larger areas
State 3-Accepted (Yes, that is a problem)
Category:Subcategory kernel:arch-x86
Keywords ssperf
Responsible Engineer Bill Holler
Reported Against s10_63
Duplicate Of
Introduced In
Commit to Fix
Fixed In
Release Fixed
Related Bugs
Submit Date 2-July-2004
Last Update Date 31-May-2007
Description
The kernel bzero, uzero, etc routines currently do not use sse or sse2
non-temporal instructions.  The only exception is hwblkclear, which
is called only for single pages.

For sse2 enabled cpus, it is not difficult to use either the movntq 
or movnti instructions to implement a bzero that doesn't allocate in
the cache.  Use of the movntq instruction is a bit more problematic,
in that care must be take to properly save and restore possible fp
register state.  For movnti, this is not an issue, but this instruction
is less efficient as it moves only 4 bytes at a time.  A userland 
check of time needed to zero a 1M area NOT IN CACHE yields:

 ~680 usecs - existing rep, sstol loop.
 ~383 usecs - movtni loop, unrolled to 64 bytes		44% savings
 ~322 usecs - movntq loop, unrolled to 64 bytes         53% savings

It seems worthwhile to investigate the use of these instructions to 
avoid clearing the cache unecessarily.  The usual problems of deciding
when to use the non-allocating instructions present themselves, since
these instructions run slower than the allocating ones if the data
is already in the cache.
Work Around
N/A
Comments
N/A