Intel® Advisor Help
If a loop has dependencies, it cannot be run in parallel and in most cases cannot be offloaded to the GPU. Intel Advisor can get the information about loop-carried dependencies from the following resources:
Using Intel® Compiler diagnostics. The dependencies are found at the compile time for some loops and the diagnostics are passed to the Intel Advisor using the integration with Intel Compilers.
Parsing the application call stack tree. If a loop is parallelized or vectorized on a CPU or is already offloaded to a GPU but executed on a CPU, Intel Advisor assumes that you resolved the loop-carried dependencies before parallelizing or offloading the loop.
Using the Dependencies analysis results. This analysis detects dependencies for most loops at run time, but a result might depend on an application workload. It also adds a high overhead making the application execute 5 - 100 times slower during the analysis. To reduce overhead, you can use various techniques, for example, mark up loops of interest.
For the Offload Modeling perspective. the Dependencies analysis is optional, but it might add important information about loop-carried dependencies Intel® Advisor to decide if a loop can be profitable to run on a graphics processing unit (GPU).
This topic describes a workflow that you can follow to understand if there are potential loop-carried dependencies in your code that might affect its performance on a target GPU.
Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
If you do not know what dependency types there are present in your application, run the Offload Modeling without the Dependencies analysis first to check if potential dependencies affect modeling results and to decide if you need to run the Dependencies analysis:
advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplication
advisor --collect=tripcounts --project-dir=./advi_results --flop --stacks --enable-cache-simulation --target-device=xehpg_512xve --data-transfer=light -- ./myApplication
advisor --collect=projection --project-dir=./advi_results
advisor --collect=projection --project-dir=./advi_results --no-assume-dependencies
advisor --collect=projection --project-dir=./advi_results --set-parallel=foo.cpp:34,bar.cpp:192
Loops that previously had Dependency: Assumed dependency type are now marked as Parallel: Assumed. Intel Advisor models their performance on the target GPU and checks potential offload profitability and speedup.
To check for real dependencies in your code, run the Dependencies analysis and rerun the Performance Modeling to get more accurate estimations of your application performance on GPU:
By default, the generic markup strategy is applied to select only potentially profitable loops to run the Dependencies analysis.
advisor --collect=dependencies --select markup=gpu_generic --loop-call-count-limit=16 --filter-reductions --project-dir=./advi_results -- ./myApplication
advisor --collect=projection --project-dir=./advi_results
Open the result in the Intel Advisor, view the interactive HTML report, or print it to the command line. Continue to investigate the results and identify code regions to offload.