Intel® Advisor Help

Model Offloading to a GPU

Find high-impact opportunities to offload/run your code and identify potential performance bottlenecks on a target graphics processing unit (GPU) by running the Offload Modeling perspective.

The Offload Modeling perspective can help you to do the following:

With the Offload Modeling perspective, the following workflows are available:

Note

You can model application performance only on Intel® GPUs.

How It Works

The Offload Modeling perspective runs the following steps:

  1. Get the baseline performance data for your application by running a Survey analysis.
  2. Identify the number of times kernels are invoked and executed and the number of floating-point and integer operations, estimate cache and memory traffics on target device memory subsystem by running the Characterization analysis.
  3. Mark up loops of interest and identify loop-carried dependencies that might block parallel execution by running the Dependencies analysis (CPU-to-GPU modeling only).
  4. Estimate the total program speedup on a target device and other performance metrics according to Amdahl's law, considering speedup from the most profitable regions by running Performance Modeling. A region is profitable if its execution time on the target is less than on a host.

The CPU-to-GPU and GPU-to-GPU modeling workflows are based on different hardware configurations, compilers code-generation principles, and software implementation aspects to provide an accurate modeling results specific to the baseline device for your application. Review the following features of the workflows:

CPU-to-GPU modeling

GPU-to-GPU modeling

Only loops/functions executed or offloaded to a CPU are analyzed.

Only GPU compute kernels are analyzed.

Loop/function characteristics are measured using the CPU profiling capabilities.

Compute kernel characteristics are measured using the GPU profiling capabilities.

Only profitable loops/functions are recommended for offloading to a target GPU. Profitability is based on the estimated speedup.

All kernels executed on GPU are modeled one to one, even if they have low speedup estimated.

High-overhead features, such as call stack handling, cache and data transfer simulation, dependencies analysis, can be enabled. You might need to run the Dependencies analysis to check if loop-carried dependencies affect performance on a GPU.

High-overhead features, such as call stack handling, cache and data transfer simulation, dependencies analysis, are disabled. You do not need to run the Dependencies analysis.

Data transfer between baseline and target devices can be simulated in two different modes: footprint-based and memory object-based.

Memory objects transferred between host and device memory are traced.

Offload Modeling Summary

Offload Modeling perspective measures performance of your application and compares it with its modeled performance on a selected target GPU so that you can decide what parts of your application you can execute on the GPU and how you can optimize it to get a better performance after offloading.

Example of a Summary report of the Offload Modeling perspective

See Also