Intel® Advisor Help

Examine Bottlenecks on CPU Roofline Chart

Accuracy Level

Low

Enabled Analyses

Survey + FLOP (Characterization)

Result Interpretation

The farther a dot is from the topmost roofs, the more room for improvement there is. In accordance with Amdahl's Law, optimizing the loops that take the largest portion of the program's total run time will lead to greater speedups than optimizing the loops that take a smaller portion of the run time.

Example of a Cache-aware CPU Roofline chart

Note

This topic describes data as it is shown in the CPU Roofline report in the Intel Advisor GUI. You can also view the result in an HTML report, but data arrangement and panes may vary.

The roofs above a dot represent the restrictions preventing it from achieving a higher performance, although the roofs below can contribute somewhat. Each roof represents the maximum performance achievable without taking advantage of a particular optimization, which is associated with the next roof up. Depending on a dot position, you can try the following optimizations.

Note

For more precise optimization recommendations, see the Roofline Guidance in Code Analytics and Roofline Conclusions in Recommendations tabs.

Dot Position

Reason

To Optimize

Below a memory roof (DRAM Bandwidth, L1 Bandwidth, so on)

The loop/function uses memory inefficiently.

Run a Memory Access Patterns analysis for this loop.

  • If MAP analysis suggests cache optimization, make any appropriate optimizations.
  • If cache optimization is impossible, try reworking the algorithm to have a higher AI.

Below Vector Add Peak

The loop/function under-utilizes available instruction sets.

Check Traits column in the Survey report to see if FMAs are used.

  • If FMA is not used, try altering your code or compiler flags to induce FMA usage.

Just above Scalar Add Peak

The loop/function is undervectorized.

Check vectorization efficiency and performance issues in the Survey. Follow the recommendations to improve it if it's low.

Below Scalar Add Peak

The loop/function is scalar.

Check the Survey report to see if the loop vectorized. If not, try to get it to vectorize if possible. This may involve running Dependencies to see if it's safe to force it.

In the following Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve or are too small to have significant impact on performance.
This is a visual model, not an actual screenshot, of the Roofline Chart

Some algorithms are incapable of breaking certain roofs. For instance, if Loop A in the example above cannot be vectorized due to dependencies, it cannot break the Scalar Add Peak.

Tip

If you cannot break a memory roof, try to rework your algorithm for higher arithmetic intensity. This will move you to the right and give you more room to increase performance before hitting the memory bandwidth roof. This would be the appropriate approach to optimizing loop F in the example, as well as loop G if its cache usage cannot be improved.

Analyze Specific Loops

Select a dot on the chart, open the Code Analytics tab to view detailed information about the selected loop:

Next Steps