The non-temporal copy routines such as kcopy_nta() were written for the processor
cache sizes of early Opterons. These routines need to become aware of the actual
processor cache size to determine which algorithms to use. Future processors will
have different cache sizes and hierachies. These routines' performance may suffer
if they are not made aware of the actual processor size.
This is an enhancement to the solution provided by CR 6226737.
Non-temporal reads bypass the cache, but temporal ones do go to the cache and use a hard-coded value for cache size. We need to find out the actual cache size and use that value instead for temporal reads.