Intel® Advisor Help
Intel® Advisor provides several methods to run the Offload Modeling perspective from command line. Use one of the following:
After you run the Offload Modeling with any method above, you can view the results in Intel Advisor graphical user interface (GUI), command line interface (CLI), or an interactive HTML report. For example, the interactive HTML report is similar to the following:
The script enables the advisor command line interface (CLI), advisor-python command line tool, and the APM environment variable, which points to the directory with Offload Modeling scripts and simplifies their use.
With the Intel Advisor, you can generate pre-configured command lines for your application and hardware. Use this feature if you want to:
Offload Modeling perspective consists of multiple analysis steps executed for the same application and project. You can configure each step from scratch or use pre-configured command lines that do not require you to provide the paths to project directory and an application executable manually.
Option 1. Generate pre-configured command lines with --collect=offload and the --dry-run option. The option generates:
Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
The workflow includes the following steps:
For example, to generate the low-accuracy commands for the myApplication application, run the following command:
advisor --collect=offload --accuracy=low --dry-run --project-dir=./advi_results -- ./myApplication
advisor --collect=offload --accuracy=low --dry-run --project-dir=.\advi_results -- .\myApplication.exe
You should see a list of commands for each analysis step to get the Offload Modeling result with the specified accuracy level (for the commands above, it is low).
Option 2. If you have an Intel Advisor graphical user interface (GUI) available on your system and you want to analyze an MPI application from command line, you can generate the pre-configured command lines from GUI.
The GUI generates:
For detailed instructions, see Generate Pre-configured Command Lines.
For the Offload Modeling perspective, Intel Advisor has a special collection mode --collect=offload that allows you to run several analyses using only oneIntel Advisor CLI command. When you run the collection, it sequentially runs data collection and performance modeling steps. The specific analyses and options depend on the accuracy level you specify for the collection.
Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
For example, to run the Offload Modeling perspective with the default (medium) accuracy level:
advisor --collect=offload --project-dir=./advi_results -- ./myApplication
advisor --collect=offload --project-dir=.\advi_results -- .\myApplication.exe
The collection progress and commands for each analysis executed will be printed to a terminal or a command prompt. By default, the performance is modeled for the Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration). When the collection is finished, you will see the result summary.
Analysis Details
To change the analyses to run and their option, you can specify a different accuracy level with the --accuracy=<level> option. The default accuracy level is medium.
The following accuracy levels are available:
For CPU applications, this accuracy level adds a high collection overhead because it includes the Dependencies analysis. This analysis is not required if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies. Otherwise, to learn how dependencies might affect your application performance on a GPU, see Check How Assumed Dependencies Affect Modeling.
For example, to run the low accuracy level:
advisor --collect=offload --accuracy=low --project-dir=./advi_results -- ./myApplication
To run the high accuracy level:
advisor --collect=offload --accuracy=high --project-dir=./advi_results -- ./myApplication
If you want to see the commands that are executed at each accuracy level, you can run the collection with the --dry-run option. The commands will be printed to a terminal or a command prompt.
For details about each accuracy level, see Offload Modeling Accuracy Levels in Command Line.
Customize Collection
You can also specify additional options if you want to run the Offload Modeling with custom configuration. This collection accepts most options of the Performance Modeling analysis (--collect=projection) and some options of the Survey, Trip Counts, and Dependencies analyses that can be useful for the Offload Modeling.
Consider the following action options:
Option |
Description |
---|---|
--accuracy=<level> |
Set an accuracy level for a collection preset. Available accuracy levels:
For details, see Offload Modeling Accuracy Levels in Command Line. |
--config |
Select a target GPU configuration to model performance for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3. See config for a full list of possible values and mapping to device names. |
--gpu |
Analyze a Data Parallel C++ (DPC++), OpenCL™, or OpenMP* target application on a graphics processing unit (GPU) device. This option automatically adds all related options to each analysis included in the preset. If you use this option, the high accuracy does not include the Dependencies analysis. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
--data-reuse-analysis |
Analyze potential data reuse between code regions. This option automatically adds all related options to each analysis included in the preset. |
--enforce-fallback |
Emulate data distribution over stacks if stacks collection is disabled. This option automatically adds all related options to each analysis included in the preset. |
For details about other available options, see collect.
You can collect data and model performance for your application by running each Offload Modeling analysis in a separate command using Intel Advisor CLI. This option allows you to:
Consider the following workflow example. Using this example, you can run the Survey, Trip Counts, and FLOP analyses to profile an application and the Performance Modeling to model its performance on a selected target device.
Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
On Linux OS:
advisor --collect=survey --static-instruction-mix --project-dir=./advi_results -- ./myApplication
advisor --collect=tripcounts --flop --stacks --cache-simulation=single --target-device=xehpg_512xve --data-transfer=light --project-dir=./advi_results -- ./myApplication
advisor -collect=dependencies --loop-call-count-limit=16 --select markup=gpu_generic --filter-reductions --project-dir=./advi_results -- ./myApplication
The Dependencies analysis adds a high collection overhead. You can skip it if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies.. If you are not sure, see Check How Assumed Dependencies Affect Modeling to learn how dependencies affect your application performance on a GPU.
advisor --collect=projection --project-dir=./advi_results
You will see the result summary printed to the command prompt.
Tip: If you already have an analysis result saved as a snapshot or a result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.
On Windows OS:
advisor --collect=survey --static-instruction-mix --project-dir=.\advi_results -- .\myApplication.exe
advisor --collect=tripcounts --flop --stacks --cache-simulation=single --target-device=xehpg_512xve --data-transfer=light --project-dir=.\advi_results -- .\myApplication.exe
advisor -collect=dependencies --loop-call-count-limit=16 --select markup=gpu_generic --filter-reductions --project-dir=.\advi_results -- myApplication.exe
The Dependencies analysis adds a high collection overhead. You can skip it if your application is highly parallelized or vectorized on a CPU or if you know that key hotspots in your application do not have loop-carried dependencies.. If you are not sure, see Check How Assumed Dependencies Affect Modeling to learn how dependencies affect your application performance on a GPU.
advisor --collect=projection --project-dir=.\advi_results
You will see the result summary printed to the command prompt.
Tip: If you already have a collected analysis result saved as a snapshot or result for an MPI rank, you can use the exp-dir option instead of project-dir to model performance for the result.
For more useful options, see the Analysis Details section below.
Analysis Details
The Offload Modeling workflow includes the following analyses:
Each analysis has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and option you use, the higher the modeling accuracy.
Consider the following options:
Survey Options
To run the Survey analysis, use the following command line action: --collect=survey.
Recommended action options:
Options |
Description |
---|---|
--static-instruction-mix |
Collect static instruction mix data. This option is recommended for the Offload Modeling perspective. |
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. If you use this option, skip the Dependencies analysis. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
Characterization Options
To run the Characterization analysis, use the following command line action: --collect=tripcounts.
Recommended action options:
Options |
Description |
---|---|
--flop |
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms. |
--stacks |
Enable advanced collection of call stack data. |
--cache-simulation=<mode> |
Simulate cache behavior for a target device. Available modes:
|
--target-device=<target> |
Specify a target graphics processing unit (GPU) to model cache for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3. See target-device for a full list of possible values and mapping to device names. Use with the --cache-simulation=single option. ImportantMake sure to specify the same target device as for the --collect=projection --config=<config>. |
--data-transfer=<mode> |
Enable modeling data transfers between host and target devices. The following modes are available:
|
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. If you use this option, skip the Dependencies analysis. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
Dependencies Options
The Dependencies analysis is optional because it adds a high overhead and is mostly necessary if you have scalar loops/functions in your application or if you do not know about loop-carried dependencies in key hotspots. For details about when you need to run the Dependencies analysis, see Check How Assumed Dependencies Affect Modeling.
To run the Dependencies analysis, use the following command line action: --collect=dependencies.
Recommended action options:
Options |
Description |
---|---|
--select=<string> |
Select loops to run the analysis for. For the Offload Modeling, the recommended value is --select markup=gpu_generic, which selects only loops/functions profitable for offloading to a target device to reduce the analysis overhead. For more information about markup options, see Loop Markup to Minimize Analysis Overhead. NoteThe generic markup strategy is recommended if you want to run the Dependencies analysis for an application that does not use DPC++, C++/Fortran with OpenMP target, or OpenCL. |
--loop-call-count-limit=<num> |
Set the maximum number of call instances to analyze assuming similar runtime properties over different call instances. The recommended value is 16. |
--filter-reductions |
Mark all potential reductions with a specific diagnostic. |
Performance Modeling Options
To run the Performance Modeling analysis, use the following command line action: --collect=projection.
Recommended action options:
Options |
Description |
---|---|
--exp-dir=<path> |
Specify a path to an unpacked result snapshot or an MPI rank result to model performance. Use this option instead of project-dir if you already have an analysis result ready. |
--config=<config> |
Select a target GPU configuration to model performance for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3. ImportantMake sure to specify the same target device as for the --collect=tripcounts --target-device=<target>.For details about configuration files, see config. |
--no-assume-dependencies |
Assume that a loop does not have dependencies if a loop dependency type is unknown. Use this option if your application contains parallel and/or vectorized loops and you did not run the Dependencies analysis. |
--data-reuse-analysis |
Analyze potential data reuse between code regions when offloaded to a target GPU. ImportantMake sure to use --data-transfer=full with --collect=tripcounts for this option to work correctly. |
--assume-hide-taxes |
Assume that an invocation tax is paid only for the first time a kernel is launched. |
--set-parameter |
Specify a single-line configuration parameter to modify in a format "<group>.<parameter>=<new-value>". For example, "min_required_speed_up=0". For details about the option, see set-parameter. For details about some of the possible modifications, see Advanced Modeling Configuration. |
--profile-gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. If you use this option, skip the Dependencies analysis. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
See advisor Command Option Reference for more options.
Intel Advisor has three scripts that use the Intel Advisor Python* API to run the Offload Modeling. You can run the scripts with the advisor-python command line tool or with your local Python 3.6 or 3.7.
The scripts vary in functionality and run different sets of Intel Advisor analyses. Depending on what you want to run, use one or several of the following scripts:
You can run the Offload Modeling using different combinations of the scripts and/or the Intel Advisor CLI. For example:
Consider the following examples of some typical scenarios with Python scripts.
Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.
Example 1. Run the run_oa.py script to profile an application and model its performance for the Intel® ™ graphics code-named Alchemist (xehpg_512xve configuration).
advisor-python $APM/run_oa.py ./advi_results --collect=basic --config=xehpg_512xve -- ./myApplication
advisor-python %APM%\run_oa.py .\advi_results --collect=basic --config=xehpg_512xve -- .\myApplication.exe
You will see the result summary printed to the command prompt.
For more useful options, see the Analysis Details section below.
Example 2. Run the collect.py to profile an application and run the analyze.py to model its performance for the Intel® Arc™ graphics code-named Alchemist (xehpg_512xve configuration).
advisor-python $APM/collect.py ./advi_results --collect=basic --config=xehpg_512xve -- ./myApplication
advisor-python $APM/analyze.py ./advi_results --config=xehpg_512xve
You will see the result summary printed to the command prompt.
advisor-python %APM%\collect.py .\advi_results --collect=basic --config=xehpg_512xve -- .\myApplication.exe
advisor-python %APM%\analyze.py .\advi_results --config=xehpg_512xve
For more useful options, see the Analysis Details section below.
Analysis Details
Each script has a set of additional options that modify its behavior and collect additional performance data. The more analyses you run and options you use, the higher the modeling accuracy.
Collection Options
The following options are applicable to the run_oa.py and collect.py scripts.
Option |
Description |
---|---|
--collect=<mode> |
Specify data to collect for your application:
See Check How Assumed Dependencies Affect Modeling to learn when you need to collect dependency data. |
--config=<config> |
Select a target GPU configuration to model performance for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3. ImportantFor collect.py, make sure to specify the same value of the --config option for the analyze.py.For details about configuration files, see config. |
--markup=<markup-mode> |
Select loops to collect Trip Counts and FLOP and/or Dependencies data for with a pre-defined markup algorithm. This option decreases collection overhead. By default, it is set to generic to analyze all loops profitable for offloading. |
--gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
For a full list of available options, see:
Performance Modeling Options
The following options are applicable to the run_oa.py and analyze.py scripts.
Option |
Description |
---|---|
--config=<config> |
Select a target GPU configuration to model performance for. For example, xehpg_512xve (default), gen12_dg1, or gen9_gt3. ImportantFor analyze.py, make sure to specify the same value of the --config option for the collect.py.For details about configuration files, see config. |
--assume-parallel |
Assume that a loop does not have dependencies if there is no information about the loop dependency type and you did not run the Dependencies analysis. |
--data-reuse-analysis |
Analyze potential data reuse between code regions when offloaded to a target GPU. ImportantMake sure to use --collect=full when running the analyses with collect.py or use the --data-transfer=full when running the Trip Counts analysis with Intel Advisor CLI. |
--gpu |
Analyze a DPC++, OpenCL, or OpenMP target application on a GPU device. For details about this workflow, see Run GPU-to-GPU Performance Modeling from Command Line. |
For a full list of available options, see:
Continue to explore the Offload Modeling results with a preferred method. For details about the metrics reported, see Accelerator Metrics.