Intel® Advisor Help
The advisor command currently supports the options shown below.
Option  | 
Description  | 
|---|---|
Set an accuracy level for the Offload Modeling collection preset.  | 
|
Add loops (by file and line number) to the loops selected for deeper analysis.  | 
|
Specify the directory where the target application runs during analysis, if it is different from the current working directory.  | 
|
Assume that a loop has dependencies if the loop dependency type is unknown.  | 
|
Estimate invocation taxes assuming the invocation tax is paid only for the first kernel launch.  | 
|
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops.  | 
|
Assume data is only transferred once for each offload, and all instances share that data.  | 
|
Finalize Survey and Trip Counts & FLOP analysis data after collection is complete.  | 
|
Emulate the execution of more than one instance simultaneously for a top-level offload.  | 
|
Run benchmarks on only one concurrently executing Intel Advisor instance to avoid concurrency issues with regard to platform limits.  | 
|
Generate a Survey report in bottom-up view.  | 
|
Enable binary visibility in a read-only snapshot you can view any time.  | 
|
Select what binary files will be added to a read-only snapshot.  | 
|
Set the cache hierarchy to collect modeling data for CPU cache behavior during Trip Counts & FLOP analysis.  | 
|
Simulate device cache behavior for your application.  | 
|
Enable source code visibility in a read-only snapshot you can view any time (with the --snapshot action). Enable keeping source code cache within a project (with the --collect action).  | 
|
Enable cache simulation for Performance Modeling.  | 
|
Set the cache associativity for modeling CPU cache behavior during the Memory Access Patterns analysis.  | 
|
Set the cache line size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.  | 
|
Set the focus for modeling CPU cache behavior during Memory Access Patterns analysis.  | 
|
Specify what percentage of total memory accesses should be processed during cache simulation.  | 
|
Set the cache set size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.  | 
|
Check the profitability of offload regions and add only profitable regions to a report.  | 
|
Clear all loops previously selected for deeper analysis.  | 
|
Specify a device configuration to model your application performance for.  | 
|
Use the projection of x86 logical instructions to GPU logical instructions.  | 
|
Project x86 memory instructions to GPU SEND/SENDS instructions.  | 
|
Count the number of accesses to memory objects created by code regions.  | 
|
Project x86 MOV instructions to GPU MOV instructions.  | 
|
Select how to model SEND instruction latency.  | 
|
Specify a scale factor to approximate a host CPU that is faster than the baseline CPU by this factor.  | 
|
Set the delimiter for a report in CSV format.  | 
|
Specify the ablosute path or name for a custom TOML configuration file with additional modeling parameters.  | 
|
Limit the maximum amount (in MB) of raw data collected during Survey analysis.  | 
|
Analyze potential data reuse between code regions.  | 
|
Set the level of details for modeling data transfers during Characterization.  | 
|
Estimate data transfers in details and latencies for each transferred object.  | 
|
Specify memory page size to set the traffic measurement granularity for the data transfer simulator.  | 
|
Show only floating-point data, only integer data, or data for the sum of both data types in a Roofline interactive HTML report.  | 
|
Remove previously collected trip counts data when re-running a Survey analysis with changed binaries.  | 
|
Do not account for optimized traffic for transcendentals on a GPU.  | 
|
Show a callstack for each loop/function call in a report.  | 
|
List all steps included in Offload Modeling batch collection at a specified accuracy level without running them.  | 
|
Specify the maximum amount of time (in seconds) an analysis runs.  | 
|
Show (in a Survey report) how many instructions of a given type actually executed during Trip Counts & FLOP analysis.  | 
|
enable-batching  | 
Deprecated.  | 
Model CPU cache behavior on your target application.  | 
|
Model data transfer between host memory and device memory.  | 
|
Enable a simulator to model GRF.  | 
|
enable-slm  | 
Deprecated. SLM is modeled by default if available.  | 
Examine specified annotated sites for opportunities to perform task-chunking modeling in a Suitability report.  | 
|
Use the same local size and SIMD width as measured on a baseline device.  | 
|
Emulate data distribution over stacks if stacks collection is disabled.  | 
|
Offload all selected code regions even if offloading their child loops/functions is more profitable.  | 
|
Estimate region speedup with relaxed constraints.  | 
|
Consider loops recommended for offloading only if they reach the minimum estimated speedup specified in a configuration file.  | 
|
Exclude the specified files or directories from annotation scanning during analysis.  | 
|
Specify an application for analysis that is not the starting application.  | 
|
Specify a path to an unpacked result snapshot or an MPI rank result to generate a report or model performance.  | 
|
Filter data by the specified column name and value in a Survey and Trips Counts & FLOP report.  | 
|
Enable filtering detected stack variables by scope (warning vs. error) in a Dependencies analysis.  | 
|
Mark all potential reductions by specific diagnostic during Dependencies analysis.  | 
|
Enable flexible cache simulation to change cache configuration without re-running collection.  | 
|
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms during Trip Counts & FLOP analysis.  | 
|
Consider all arithmetic operations as single-precision floating-point or int32 operations.  | 
|
Consider all arithmetic operations as double-precision floating-point or int64 operations.  | 
|
Set a report output format.  | 
|
With Offload Modeling perspective, analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Graphics. With GPU Roofline Insights perspective. create a Roofline interactive HTML report for data collected on GPUs.  | 
|
Collect memory traffic generated by OpenCL™ and Intel® Media SDK programs executed on Intel® Processor Graphics.  | 
|
gpu-kernels  | 
Deprecated. Use --profile-gpu or --gpu instead.  | 
Specify time interval, in milliseconds, between GPU samples during Survey analysis.  | 
|
Disable data transfer tax estimation.  | 
|
Specify runtimes or libraries to ignore time spent in these regions when calculating per-program speedup.  | 
|
Ignore mismatched target or application parameter errors before starting analysis.  | 
|
Ignore mismatched module checksums before starting analysis.  | 
|
Analyze the Nth child process during Memory Access Patterns and Dependencies analysis.  | 
|
Model traffic on all levels of the memory hierarchy for a Roofline report.  | 
|
Set the length of time (in milliseconds) to wait before collecting each sample during Survey analysis.  | 
|
Set the maximum number of top items to show in a report.  | 
|
Set the maximum number of instances to analyze for all marked loops.  | 
|
Specify total time, in milliseconds, to filter out loops that fall below this value.  | 
|
Select loops (by criteria instead of human input) for deeper analysis.  | 
|
Enable/disable user selection as a way to control loops/functions identified for deeper analysis.  | 
|
After running a Survey analysis and identifying loops of interest, select loops (by file and line number or ID) for deeper analysis.  | 
|
Model specific memory level(s) in a Roofline interactive HTML report, including L1, L2, L3, and DRAM.  | 
|
Model only load memory operations, store memory operations, or both, in a Roofline interactive HTML report.  | 
|
Show dynamic or static instruction mix data in a Survey report.  | 
|
Collect Intel® oneAPI Math Kernel Library (oneMKL) loops and functions data during the Survey analysis.  | 
|
Use the baseline GPU configuration as a target device for modeling.  | 
|
Analyze child loops of the region head to find if some of the child loops provide more profitable offload.  | 
|
Model calls to math functions such as EXP, LOG, SIN, and COS as extended math instructions, if possible.  | 
|
Analyze code regions with system calls considering they are separated from offload code and executed on a host device.  | 
|
Specify application (or child application) module(s) to include in or exclude from analysis.  | 
|
Limit, by inclusion or exclusion, application (or child application) module(s) for analysis.  | 
|
Specify MPI process data to import.  | 
|
Set the Microsoft* runtime environment mode for analysis.  | 
|
When searching for an optimal N-dimensional offload, limit the maximum loop depth that can be converted to one offload.  | 
|
Specify a text file containing command line arguments.  | 
|
Enable asynchronous execution to overlap offload overhead with execution time.  | 
|
Pack a snapshot into an archive.  | 
|
Analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Processor Graphics.  | 
|
Show Intel® performance libraries loops and functions in Intel® Advisor reports.  | 
|
Collect metrics about Just-In-Time (JIT) generated code regions during the Trip Counts and FLOP analysis.  | 
|
Collect Python* loop and function data during Survey analysis.  | 
|
Collect metrics for stripped binaries.  | 
|
Specify the top-level directory where a result is saved if you want to save the collection somewhere other than the current working directory.  | 
|
Minimize status messages during command execution.  | 
|
Recalculate total time after filtering a report.  | 
|
Enable heap allocation tracking to identify heap-allocated variables for which access strides are detected during Memory Access Patterns analysis.  | 
|
Capture stack frame pointers to identify stack variables for which access strides are detected during Memory Access Patterns analysis.  | 
|
Examine specified annotated sites for opportunities to reduce lock contention or find deadlocks in a Suitability report.  | 
|
Examine specified annotated sites for opportunities to reduce lock overhead in a Suitability report.  | 
|
Examine specified annotated sites for opportunities to reduce site overhead in a Suitability report.  | 
|
Examine specified annotated sites for opportunities to reduce task overhead in a Suitability report.  | 
|
Refinalize a survey result collected with a previous Intel® Advisor version or if you need to correct or update source and binary search paths.  | 
|
Remove loops (by file and line number) from the loops selected for deeper analysis.  | 
|
Redirect report output from stdout to another location.  | 
|
Specify the PATH/name of a custom report template file.  | 
|
Specify a directory to identify the running analysis.  | 
|
Resume collection after the specified number of milliseconds.  | 
|
Return the target exit code instead of the command line interface exit code.  | 
|
Specify the location(s) for finding target support files.  | 
|
Enable searching for an optimal N-dimensional offload.  | 
|
Select loops (by file and line number, ID, or criteria) for deeper analysis.  | 
|
Assume loops with specified IDs or source locations have a dependency.  | 
|
Assume loops with specified IDs or source locations are parallel.  | 
|
Specify a single-line parameter to modify in a target device configuration.  | 
|
Show data for all available columns in a Survey report.  | 
|
Show data for all available rows, including data for child loops, in a Survey report.  | 
|
Show only functions in a report.  | 
|
Show only loops in a report.  | 
|
Show not-executed child loops in a Survey report.  | 
|
Generate a Survey report for data collected for GPU kernels.  | 
|
Specify the total time threshold, in milliseconds, to filter out nodes that fall below this value from PDF and DOT Offload Modeling reports.  | 
|
Sort data in ascending order (by specified column name) in a report.  | 
|
Sort data in descending order (by specified column name) in a report.  | 
|
Register flow analysis to calculate the number of consecutive load/store operations in registers and related memory traffic in bytes during Survey analysis.  | 
|
Specify stack access size to set stack memory access measurement granularity for the data transfer simulation.  | 
|
Restructure the call flow during Survey analysis to attach stacks to a point introducing a parallel workload.  | 
|
Set stack size limit for analyzing stacks after collection.  | 
|
Perform advanced collection of callstack data during Roofline and Trip Counts & FLOP analysis.  | 
|
Choose between online and offline modes to analyze stacks during Survey analysis.  | 
|
Start executing the target application for analysis purposes, but delay data collection.  | 
|
Statically calculate the number of specific instructions present in the binary during Survey analysis.  | 
|
Specify processes and/or children for instrumentation during Survey analysis.  | 
|
Collect a variety of data during Survey analysis for loops that reside in non-executed code paths.  | 
|
Specify a device configuration to model cache for during Trip Counts collection.  | 
|
Specify a target GPU to collect data for if you have multiple GPUs connected to your system.  | 
|
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process ID.  | 
|
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process name.  | 
|
Specify the hardware configuration to use for modeling purposes in a Suitability report.  | 
|
Specify the threading model to use for modeling purposes in a Suitability report.  | 
|
Specify the number of parallel threads to use for offload heads.  | 
|
Generate a Survey report in top-down view.  | 
|
Set how to trace loop iterations during Memory Access Patterns analysis.  | 
|
Configure collectors to trace MPI code and determine MPI rank IDs for non-Intel® MPI library implementations.  | 
|
Attribute memory objects to the analyzed loops that accessed the objects.  | 
|
Track accesses to stack memory.  | 
|
Enable parallel data sharing analysis for stack variables during Dependencies analysis.  | 
|
Collect loop trip counts data during Trip Counts & FLOP analysis.  | 
|
use-collect-configs  | 
Deprecated.  | 
user-data-dir  | 
Deprecated.  | 
Maximize status messages during command execution.  | 
|
Show call stack data in a Roofline interactive HTML report (if call stack data is collected).  |