distribute

Instructs the compiler to prefer loop distribution at the location indicated.

Syntax

#pragma distribute_point

Arguments

None

Description

The distribute_point pragma is used to suggest to the compiler to split large loops into smaller ones; this is particularly useful in cases where optimizations like software-pipelining (SWP) or vectorization cannot take place due to excessive register usage.

Using distribute_point pragma for a loop distribution strategy enables software pipelining for the new, smaller loops in the IA-64 architecture. By splitting a loop into smaller segments, it is possible to get each smaller loop or at least one of the smaller loops to SWP or vectorize.

When the pragma is placed inside a loop, the compiler distributes the loop at that point. All loop-carried dependencies are ignored.
When inside the loop, pragmas cannot be placed within an if statement.
When the pragma is placed outside the loop, the compiler distributes the loop based on an internal heuristic. The compiler determines where to distribute the loops and observes data dependency. If the pragmas are placed inside the loop, the compiler supports multiple instances of the pragma.

Example

Example 1: Using distribute_point pragma outside the loop

The following example uses the distribute_point pragma outside the loop.

#define NUM 1024

void loop_distribution_pragma1(

double a[NUM], double b[NUM], double c[NUM],

double x[NUM], double y[NUM], double z[NUM] )

{

int i;

// Before distribution or splitting the loop

#pragma distribute_point

for (i=0; i< NUM; i++) {

a[i] = a[i] + i;

b[i] = b[i] + i;

c[i] = c[i] + i;

x[i] = x[i] + i;

y[i] = y[i] + i;

z[i] = z[i] + i;

}

Example 2: Using distribute_point pragma inside the loop

The following example uses the distribute_point pragma inside the loop.

#define NUM 1024

void loop_distribution_pragma2(

double a[NUM], double b[NUM], double c[NUM],

double x[NUM], double y[NUM], double z[NUM] )

{

int i;

// After distribution or splitting the loop.

for (i=0; i< NUM; i++) {

a[i] = a[i] +i;

b[i] = b[i] +i;

c[i] = c[i] +i;

#pragma distribute_point

x[i] = x[i] +i;

y[i] = y[i] +i;

z[i] = z[i] +i;

}

Example 3: Using distribute_point pragma inside and outside the loop

The following example shows how to use the distribute_point pragma, first outside the loop and then inside the loop.

void dist1(int a[], int b[], int c[], int d[])

{

#pragma distribute_point

// Compiler will automatically decide where to

// distribute. Data dependency is observed.

for (int i=1; i<1000; i++) {

b[i] = a[i] + 1;

c[i] = a[i] + b[i];

d[i] = c[i] + 1;

}

void dist2(int a[], int b[], int c[], int d[])

{

for (int i=1; i<1000; i++) {

b[i] = a[i] + 1;

#pragma distribute_point

// Distribution will start here,

// ignoring all loop-carried dependency.

c[i] = a[i] + b[i];

d[i] = c[i] + 1;

}