Intel® Advisor Help

Examine Kernel Details

After identifying hotspots, use the GPU Roofline Insights perspective to analyze their performance deeper. Select a dot on the chart and use GPU Details and Recommendations tabs in the right-side pane to examine code analytics for a specific kernel in more details and view actionable recommendations for code optimization.

Note

Families of Intel® Xe graphics products starting with Intel® Arc™ Alchemist (formerly DG2) and newer generations feature GPU architecture terminology that shifts from legacy terms. For more information on the terminology changes and to understand their mapping with legacy content, see GPU Architecture Terminology for Intel® Xe Graphics.

Get Recommendations

Check the Performance Issues column of the GPU pane to see if Intel® Advisor identifies any recommendations for a kernel.

Select a kernel on a Roofline chart and switch to Recommendations tab to view actionable recommendations helping you optimize your code for compute and memory bound applications running on GPU. Expand a recommendation to access a full description and a code sample containing a possible solution of the problem.

Review Compute and Memory Bandwidth Utilization

Review how well your kernel uses the compute and memory bandwidth of your hardware in the OP/S and Bandwidth pane. It indicates the following metrics:

For example, in the screenshot below, the dominating data type is FLOP. The kernel utilizes 19% of L3 Bandwidth. Considering these data and compared to utilization metrics for other memory levels and compute capacity, the Roofline chart displays the L3 Bandwidth as the main factor limiting the performance of the kernel.

Review how your application uses memory levels using the Memory Metrics pane:

Note

Data in the Memory Metrics pane is based on a dominant type of operations in your code (FLOAT or INT).

Explore Operation Types Used During Application Execution

Examine instruction types that the kernel executes in the Instruction Mix pane. For example, in a screenshot below, the kernel mostly executes compute instructions with integer operations, which means that the kernel is mostly compute bound.

Intel Advisor automatically determines the data type used in operations and groups the instructions collected during Characterization analysis by the following categories:

Category

Instruction Types

Compute (FLOP and INTOP)

  • BASIC COMPUTE: add, addc, mul, rndu, rndd, rnde, rndz, subb, avg, frc, lzd, fbh, fbl, cbit
  • BIT: and, not, or, xor, asr, shr, shl, bfrev, bfe, bfi1, bfi2, ror, rol
  • FMA: mac, mach, mad, madm (weight 2)
  • DIV: INT_DIV_BOTH, INT_DIV_QUOTIENT, INT_DIV_REMAINDER, and FDIV types of extended math function
  • POW extended math function
  • MATH: other function types performed by math instruction
  • VECTOR: add3 (weight 2), line (weight 2), sad2 (weight 3), dp2 (weight 3), sada2 (weight 4), lrp (weight 4), pln (weight 4), dp3 (weight 5), dph (weight 6), dp4 (weight 7), dp4a (weight 8)

Memory

LOAD, STORE, SLM_LOAD, SLM_STORE types depending on the argument: send, sendc, sends, sendsc

Other

  • MOVE: mov, sel, movi, smov, csel
  • CONTROL FLOW: if, else, endif, while, break, cont, call, calla, ret, goto, jmpi, brd, brc, join, halt
  • SYNC: wait, sync
  • OTHER: cmp, cmpn, nop, f32to16, f16to32, dim

Atomic

SEND

Get more insights about instructions used in your kernel using Instruction Mix Details pane:

In the Performance Characteristics, review how effectively the kernel uses the GPU resources: activity of all execution units, percentage of time when both FPUs are used, percentage of cycles with a thread scheduled. Ideally, you should see a higher percentage of active execution units and other effectiveness metrics to use more GPU resources.

See Also