Provides a method to control load latency and temporal locality at the variable level.
#pragma memref_control [name1[:<locality>[:<latency>]],[name2...] |
name1, name2 |
Specifies the name of array or pointer. You must specify at least one name; however, you can specify names with associated locality and latency values. |
locality |
An optional integer value that indicates the desired cache level to store data for future access. This will determine the load/store hint (or prefetch hint) to be used for this reference. The value can be one of the following:
To use this argument, you must also specify name. |
latency |
An optional integer value that indicates the load (or the latency that has to be overlapped if a prefetch is issued for this address). The value can be one of the following:
To use this argument, you must also specify name and locality. |
The memref_control pragma is supported on Itanium® processors only. This pragma provides a method for controlling load latency and temporal locality at the variable level. The memref_control pragma allows you to specify locality and latency at the array level. For example, using this pragma allows you to control the following:
The location (cache level) to store data for future access.
The most appropriate latency value to be used for a load, or the latency that has to be overlapped if a prefetch is issued for this reference.
When you specify source-level and the data locality information at a high level for a particular data access, the compiler decides how best to use this information. If the compiler can prefetch profitably for the reference, then it issues a prefetch with a distance that covers the specified latency specified and then schedules the corresponding load with a smaller latency. It also uses the hints on the prefetch and load appropriately to keep the data in the specified cache level.
If the compiler cannot compute the address in advance, or decides that the overheads for prefetching are too high, it uses the specified latency to separate the load and its use (in a pipelined loop or a Global Code Scheduler loop). The hint on the load/store will correspond to the cache level passed with the locality argument.
You can use this with the prefetch and noprefetch to further tune the hints and prefetch strategies. When using the memref_control with noprefetch, keep the following guidelines in mind:
Specifying noprefetch along with the memref_control causes the compiler to not issue prefetches; instead the latency values specified in the memref_control is used to schedule the load.
There is no ordering requirements for using the two pragmas together. Specify the two pragmas in either order as long as both are specified consecutively just before the loop where it is to be applied. Issuing a prefetch with one hint and loading it later using a different hint can provide greater control over the hints used for specific architectures.
memref_control is handled differently from the prefetch or noprefetch. Even if the load cannot be prefetched, the reference can still be loaded using a non-default load latency passed to the latency argument.
Example 1: Using #pragma memref_control when prefetching is not possible
The following example illustrates a case where the address is not known in advance, so prefetching is not possible. The compiler, in this case, schedules the loads of the tab array with an L3 load latency of 15 cycles (inside a software pipelined loop or GCS loop).
#pragma memref_control tab : l2 : l3_latency
for (i=0; i<n; i++)
{
x = <generate 64 random bits inline>;
dum += tab[x&mask]; x>>=6;
dum += tab[x&mask]; x>>=6;
dum += tab[x&mask]; x>>=6;
}
Example 2: Using #pragma memref_control with prefetch and noprefetch pragmas [sparse matrix]
The following example illustrates one way of using memref_control, prefetch, and noprefetch together.
if( size <= 1000 ) {
v#pragma noprefetch cp, vp
#pragma memref_control x:l2:l3_latency
#pragma noprefetch yp, bp, rp
#pragma noprefetch xp
for (iii=0; iii<rag1m0; iii++) {
if( ip < rag2 ) {
sum -= vp[ip]*x[cp[ip]];
ip++;
} else {
xp[i] = sum*yp[i];
i++;
sum = bp[i];
rag2 = rp[i+1];
}
}
xp[i] = sum*yp[i];
} else {
#pragma prefetch cp, vp
#pragma memref_control x:l2:mem_latency
#pragma prefetch yp, bp, rp
#pragma noprefetch xp
for (iii=0; iii<rag1m0; iii++) {
if( ip < rag2 ) {
sum -= vp[ip]*x[cp[ip]];
ip++;
} else {
xp[i] = sum*yp[i];
i++;
sum = bp[i];
rag2 = rp[i+1];
}
}
xp[i] = sum*yp[i];
}