Automatic Chunking

A parallel loop construct incurs overhead cost for every chunk of work that it schedules. Intel® oneAPI Threading Building Blocks (oneTBB) chooses chunk sizes automatically, depending upon load balancing needs. The heuristic attempts to limit overheads while still providing ample opportunities for load balancing.

Caution

Typically a loop needs to take at least a million clock cycles to make it worth using parallel_for. For example, a loop that takes at least 500 microseconds on a 2 GHz processor might benefit from parallel_for.

The default automatic chunking is recommended for most uses. As with most heuristics, however, there are situations where controlling the chunk size more precisely might yield better performance.