This topic presents specific C++ language features that better help to vectorize code.
The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors.
The __declspec(align(n)) declaration enables you to overcome hardware alignment constraints. The auto-vectorization hints address the stylistic issues due to lexical scope, data dependency, and ambiguity resolution. The SIMD feature's pragma allows you to enforce vectorization of loops.
You can use the __declspec(vector)__attribute__(vector) and the __declspec(vector[clauses])__attribute__(vector(clauses))declarations to vectorize user-defined functions and loops. For SIMD usage, the vector function is called from a loop that is being vectorized.
The C/C++ extensions for Array Notations map operations can be defined to provide general data parallel semantics, where you do not express the implementation strategy. You can write the same operation regardless of the size of the problem. The implementation uses the construct by combining SIMD, loops and tasking to implement the operation. With these semantics, you can choose more elaborate programming and express a single dimensional operation at two levels. You can use both task constructs and array operations to force a preferred parallel and vector execution.
The usage model of the vector declaration takes a small section of code generated for the function ( vectorlength ) of the array and exploits SIMD parallelism. The implementation of task parallelism is done at the call site.
The following table summarizes the language features that help vectorize code.
| Language Feature | Description | 
|---|---|
| __declspec(align(n)) | Directs the compiler to align the variable to an n-byte boundary. Address of the variable is address mod n=0. | 
| __declspec(align(n,off)) | Directs the compiler to align the variable to an n-byte boundary with offset off within each n-byte boundary. Address of the variable is address mod n=off. | 
| __declspec(vector) (Windows*) __attribute__(vector) (Linux*) | Combines with the map operation at the call site to provide the data parallel semantics. When multiple instances of the vector declaration are invoked in a parallel context, the execution order among them is not sequenced. | 
| __declspec(vector[clauses]) (Windows*) __attribute__(vector(clauses)) (Linux* | Combines with the map operation at the call site to provide the data parallel semantics with the following values for clauses: 
 When multiple instances of the vector declaration are invoked in a parallel context, the execution order among them is not sequenced. | 
| restrict | Permits the disambiguator flexibility in alias assumptions, which enables more vectorization. | 
| __declspec(vector_variant(clauses)) (Windows*) __attribute__(vector_variant(clauses)) (Linux*) | Provides the ability to vectorize user-defined functions and loops. The clauses are as follows: 
 | 
| __assume_aligned(a,n) | Instructs the compiler to assume that array a is aligned on an n-byte boundary; used in cases where the compiler has failed to obtain alignment information. | 
| __assume(cond) | Instructs the compiler to assume that the represented condition is true where the keyword appears. Typically used for conveying properties that the compiler can take advantage of for generating more efficient code, such as alignment information. | 
| Auto-vectorization Hints | |
| #pragma ivdep | Instructs the compiler to ignore assumed vector dependencies. | 
| #pragma vector 
                   | Specifies how to vectorize the loop and indicates that efficiency heuristics should be ignored. Using the assert keyword with the vector {always} pragma generates an error-level assertion message if the compiler efficiency heuristics indicate that the loop cannot be vectorized. Use #pragma ivdep! to ignore the assumed dependencies. | 
| #pragma novector | Specifies that the loop should never be vectorized. | 
Some pragmas are available for both Intel® microprocessors and non-Intel microprocessors, but may perform additional optimizations for Intel® microprocessors than for non-Intel microprocessors.
| User-Mandated Pragma | |
|---|---|
| #pragma simd | Enforces vectorization of loops. | 
| omp simd | Transforms the loop into a loop that will be executed concurrently using SIMD instructions. |