Intel® Advisor Help

Examine Relationships Between Memory Levels

Accuracy Level

Medium

Enabled Analyses

Survey + Characterization (Trip Counts and FLOP, Call Stacks, Memory-Level) + Memory Access Patterns

Result Interpretation

In the Medium accuracy preset, the Intel® Advisor extends the basic Roofline capability and collects metrics for all memory levels and the callstack data, which allows you to analyze your application in more detail. Roofline chart uses the results of Memory Access Patterns analysis to understand what bounds the loop and build recommendations in Roofline Guidance.

For information about Memory Access Patterns data interpretation, refer to Investigate Memory Usage and Traffic.

Note

This topic describes data as it is shown in the CPU Roofline report in the Intel Advisor GUI. You can also view the result in an HTML report, but data arrangement and panes may vary.

Memory-Level Roofline

The Memory-Level Roofline allows you to examine each loop at different cache levels and arithmetic intensities and provides precise insights into which cache level causes the performance bottlenecks.

The Memory-Level Roofline can help you to:

Memory-level CPU Roofline chart with dots expanded for all memory levels

To configure the Memory-Level Roofline chart:

  1. Expand the filter pane in the Roofline chart toolbar.
  2. In the Memory Level section, select the memory levels you want to see metrics for.

    Configure CPU Memory-level Roofline chart

  3. Click Apply.
  4. In the Roofline chart, double-click a loop to examine how the relationships between displayed memory levels and roofs. Labeled dots are displayed, representing memory levels with arithmetic intensity for the selected loop/function; lines connect the dots to indicate that they correspond to the selected loop/function.

    See dots for all memory levels on memory-level CPU roofline chart

Tip

By default, the Memory-Level Roofline chart is generated for the system cache configuration. You can also generate the chart for a custom cache configuration:
  1. Go to Project Properties > Trip Count and FLOP.
  2. In the Cache simulator field, click Modify.
  3. Click Add and enter/select the desired cache configurations.
  4. Re-run the Roofline with the Medium accuracy.

Memory-Level Roofline Data

Intel® Advisor collects integrated traffic data for all traffic types between a CPU and different memory subsystem using cache simulation. With this data, Intel® Advisor counts the number of data transfers for a given cache level and computes AI for each loop and each memory level.

Review the changes in the traffic from one memory level to another and compare it to respective to identify the memory hierarchy bottleneck for the kernel and determine optimization steps based on this information.

  • The vertical distance between memory dots and their respective roofline shows how much you are limited by a given memory subsystem. If a dot is close to its roof line, it means that the kernel is limited by the performance of this memory level.
  • The horizontal distance between memory dots indicates how efficiently the loop/function uses cache. For example, if L3 and DRAM dots are very close on the horizontal axis for a single loop, the loop/function uses L3 and DRAM similarly. This mean that it does not use L3 and DRAM efficiently. You can try to improve re-usage of data in the code to change arithmetic intensity for all loops/functions and improve application performance. For more precise advice, see the Roofline Guidance in the Code Analytics tab.
  • Arithmetic intensity determines the order in which dots are plotted, which can provide some insight into your code's performance. For example, the L1 dot should be the largest and first plotted dot on the chart from left to right. However, memory access type, latency, or technical issues can change the order of the dots. Continue to run the Memory Access Pattern analysis to investigate this issue.

To examine a specific loop in more details, select a dot on the chart and open the Code Analytics tab below the chart:

Roofline with Callstacks

Intel® Advisor basic Roofline model, the Cache-Aware Roofline Model (CARM), offers self data capability. Intel® Advisor Roofline with Callstacks feature extends the basic model with total data capability:

The total-data capability in the Roofline with Callstacks feature can help you:

To view the callstacks, enable the With Callstacks checkbox in the Roofline chart.

CPU Roofline chart with callstacks

To show/hide dot descendants:

Roofline with Callstacks Chart Data

The following Roofline chart representation shows some of the added benefits of the Roofline with Callstacks feature, including:

  • A navigable, color-coded Callstack pane that shows the entire call chain for the selected loop/function, but excludes its callees

  • Visual indicators (caller and callee arrows) that show the relationship among loops and functions

  • The ability to simplify dot-heavy charts by collapsing several small loops into one overall representation

    Loops/functions with no self data are grayed out when expanded and in color when collapsed. Loops/functions with self data display at the coordinates, size, and color appropriate to the data when expanded, but have a gray halo of the size associated with their total time. When such loops/functions are collapsed, they change to the size and color appropriate to their total time and, if applicable, move to reflect the total performance and total arithmetic intensity.


Intel Advisor: Roofline with Callstacks

See Also