.. _debug-a-dpcpp-application-on-a-gpu: *Tutorial: Debugging with Intel® Distribution for GDB\** Debug a SYCL\* Application on a GPU =================================== This section describes a basic scenario of debugging a SYCL\* program with the kernel offloaded to the GPU. Before you proceed, make sure you have completed all necessary setup steps described in the `Get Started Guide `__. .. _basic_debugging_gpu: Basic Debugging --------------- .. note:: For your convenience, all common Intel® Distribution for GDB\* commands used in examples below are provided in the `reference sheet `__. Consider the ``array-transform.cpp`` example again: .. code-block:: 54 h.parallel_for(data_range, [=](id<1> index) { 55 size_t id0 = GetDim(index, 0); 56 int element = in[index]; // breakpoint-here 57 int result = element + 50; 58 if (id0 % 2 == 0) { 59 result = result + 50; // then-branch 60 } else { 61 result = -1; // else-branch 62 } 63 out[index] = result; 64 }); If you have not already done so, start the debugger: .. code-block:: gdb-oneapi array-transform Start the debugger, set two breakpoints inside the kernel (one for each conditional branch) as follows: #. .. container:: :name: LI_24BEE1FC2E5442FA8203998E30A9A326 .. code-block:: break 59 Expected output: .. code-block:: Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 59. #. .. container:: :name: LI_47F2FBA6C48045D5873247C57F540632 .. code-block:: break 61 Expected output: .. code-block:: Breakpoint 2 at 0x40584a: file /path/to/array-transform.cpp, line 61. .. note:: Do not expect the output you receive will match exactly the one provided in the tutorial. The output may vary due to the nature of parallelism and different machine properties. The ellipsis *[...]* denotes output omitted for brevity. To start the program, execute: .. code-block:: run gpu You should see the following output: .. code-block:: Starting program: /path/to/array-transform gpu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff37dc700 (LWP 9479)] intelgt: gdbserver-gt started for process 28837. Will listen for an attached process [New Thread 0x7fffe21e9700 (LWP 9599)] [SYCL] Using device: [Intel® Iris® Plus Graphics 650 [0x5927]] from [Intel® Level-Zero] intelgt: attached to device 1 of 1; id 0x5927 (Gen9) [New inferior 2] [New Thread 1.1073741824] [New Thread 1.1073741888] [New Thread 1.1073742080] [New Thread 1.1073742144] [New Thread 1.1073742336] [New Thread 1.1073745920] [New Thread 1.1073746176] [New Thread 1.1073746432] [Switching to Thread 1.1073741824 lane 1] Thread 2.1 hit Breakpoint 2, with SIMD lanes [1 3 5 7], main::$_1::operator()[...] at array-transform.cpp:61 61 result = -1; // else-branch The debugger has a mechanism called `"auto-attach" `__ that spawns an instance of ``gdbserver-gt`` to listen to and control the GPU for debug. In the example above, the auto-attach mechanism is triggered and the ``gdbserver-gt`` is added to the debugger as an inferior. Check the presence of ``gdbserver-gt`` as follows: .. code-block:: info inferiors Expected output: .. code-block:: Num Description Connection Executable 1 process 9463 1 (native) * 2 device 1 2 (extended-remote gdbserver-gt --multi --hostpid=9463 -) .. note:: The auto-attach feature sets ``schedule-multiple`` to ``on``, which allows all threads of all processes to run during the same session. For example, when you run the ``continue`` command, all inferiors will continue. The breakpoint event is received from the ``gdbserver-gt`` process. The thread ID 2.1:1 points to the thread 1 of the inferior 2 and indicates that the first active SIMD lane is now in focus. The breakpoint at line 61 is hit first. The order of branch execution is defined by the Intel® Graphics Compiler. Check which SIMD lanes are currently active with the following command: .. code-block:: info threads In the example, thread 2.1 has 4 active SIMD lanes: 1, 3, 5, and 7. The asterisk \* marks the current SIMD lane. See the expected output below. .. note:: SIMD lane enumeration starts from 0. .. code-block:: Id Target Id Frame 1.1 Thread [...] [...] 1.2 Thread [...] [...] 2.1:1 Thread 1.1073741824 at array-transform.cpp:61 2.1:[3 5 7] Thread 1.1073741824 at array-transform.cpp:61 2.2:[1 3 5 7] Thread 1.1073741888 at array-transform.cpp:61 2.4:[1 3 5 7] Thread 1.1073742080 at array-transform.cpp:61 2.5:[1 3 5 7] Thread 1.1073742144 at array-transform.cpp:61 2.6:[1 3 5 7] Thread 1.1073742336 at array-transform.cpp:61 2.7:[1 3 5 7] Thread 1.1073745920 at array-transform.cpp:61 2.8:[1 3 5 7] Thread 1.1073746176 at array-transform.cpp:61 2.9:[1 3 5 7] Thread 1.1073746432 at array-transform.cpp:61 To switch the focus to a different SIMD lane, use the ``thread `` command. Thread ID is specified by a triple: ``inferior.thread:lane``. See examples of working with particular lanes: - .. container:: :name: LI_74E287D371CD44F9B3F32F164E85746A #. .. container:: :name: LI_3B059DC34198496ABF9261F721C10A05 .. code-block:: thread 2.1:3 Example output: .. code-block:: [Switching to thread 2.1:3 (Thread 1.1073741824 lane 3)] #0 main::$_1::operator()[...] at array-transform.cpp:61 61 result = -1; // else-branch #. .. container:: :name: LI_0B84D48C3024446BA356EE62E50FAFFB .. code-block:: print element Example output: .. code-block:: $1 = 103 - .. container:: :name: LI_A614C4B1D59F47339005C2DB19B59353 #. .. container:: :name: LI_C3607D9F91FB47E38891DDE4320117BF .. code-block:: thread 2.1:5 Example output: .. code-block:: [Switching to thread 2.1:5 (Thread 1.1073741824 lane 5)] #0 main::$_1::operator()[...] at array-transform.cpp:61 61 result = -1; // else-branch #. .. container:: :name: LI_CA999408C32046328327ACFD3445FD56 .. code-block:: print element Example output: .. code-block:: $2 = 105 .. note:: In the thread ID, the inferior number can be skipped. In this case, the current inferior ID is used. The thread number can also be skipped in case of switching to a lane in the current thread. Thus, the command below can be used to switch to the desired thread: .. code-block:: thread :5 Expected output: .. code-block:: [Switching to thread 2.1:5 (Thread 1.1073741824 lane 5)] #0 main::$_1::operator()[...] at array-transform.cpp:61 61 result = -1; // else-branch As you are now inside the kernel running on the GPU, you can look into the assembly code and GPU registers, for example, to understand the cause of unexpected application behavior. Get the GPU assembly code to inspect generated instructions by executing the following command: .. code-block:: disassemble See an example output below: .. code-block:: Dump of assembler code for function _ZTSN2cl4sycl6kernelE(...): 0x00000000fffad000 <+0>: mov (1|M0) null<1>:ud 0xC72C169A:ud 0x00000000fffad010 <+16>: (W) mov (8|M0) r22.0<1>:ud r0.0<1;1,0>:ud 0x00000000fffad020 <+32>: (W) or (1|M0) cr0.0<1>:ud cr0.0<0;1,0>:ud 0x4C0:uw {Switch} 0x00000000fffad030 <+48>: (W) mov (8|M0) r9.0<1>:w 0x76543210:v 0x00000000fffad040 <+64>: (W) and (1|M0) r8.1<1>:d r22.5<0;1,0>:d 511:w 0x00000000fffad050 <+80>: (W) mul (1|M0) r8.1<1>:d r8.1<0;1,0>:d 0xC440:uw 0x00000000fffad060 <+96>: (W) add (1|M0) r8.2<1>:d r8.1<0;1,0>:d 0x8440:uw 0x00000000fffad070 <+112>: mov (8|M0) r9.0<1>:d r9.0<8;8,1>:uw 0x00000000fffad080 <+128>: mul (8|M0) r10.0<1>:d r9.0<8;8,1>:d 8:w To learn more about GEN assembly and registers, refer to the `"Introduction to GEN assembly" `__ article. To display a list of GPU registers, run the following command: .. code-block:: info reg You can use registers to see the state of the application or inspect arithmetic instructions: which operands are used and where the result is located. Additionally, you can inspect the execution mask (``$emask`` register), which shows active lanes. To print the result in binary format, use the ``/t`` format flag as follows: .. code-block:: print/t $emask Example output: .. code-block:: $3 = 10101010 Recall that you have stopped at line 61: the *else*-branch of the condition that checks evenness of the work item index. Hence, every other SIMD lane is inactive, as indicated by the ``$emask`` bit pattern. To move forward and stop at the *then*-branch, set the scheduler-locking mode to *step* and execute the ``next`` command. The ``set scheduler-locking step`` command keeps the other threads stopped while the current thread is stepping: .. code-block:: set scheduler-locking step .. code-block:: next You should see the following output: .. code-block:: [Switching to SIMD lane 0] Thread 2.1 hit Breakpoint 1, with SIMD lanes [0 2 4 6], main::$_1::operator()[...] at array-transform.cpp:59 59 result = result + 50; // then-branch Due to the breakpoint event, the SIMD lane focus switches to the first active lane in the *then*-branch, which is SIMD lane 0. Other threads of inferior 2 stayed at the line 61: .. code-block:: info threads 2.* Example output: .. code-block:: Id Target Id Frame *2.1:0 Thread 1.1073741824 at array-transform.cpp:59 2.1:[2 4 6] Thread 1.1073741824 at array-transform.cpp:59 2.2:[1 3 5 7] Thread 1.1073741888 at array-transform.cpp:61 2.3:[1 3 5 7] Thread 1.1073742080 at array-transform.cpp:61 2.4:[1 3 5 7] Thread 1.1073742144 at array-transform.cpp:61 2.5:[1 3 5 7] Thread 1.1073742336 at array-transform.cpp:61 2.6:[1 3 5 7] Thread 1.1073745920 at array-transform.cpp:61 2.7:[1 3 5 7] Thread 1.1073746176 at array-transform.cpp:61 2.8:[1 3 5 7] Thread 1.1073746432 at array-transform.cpp:61 Since the thread is vectorized, you can also inspect the vector of a local variable: .. code-block:: x /8dw &result Example output: .. code-block:: 0x7fffe3f972c0: 150 -1 152 -1 0x7fffe3f972d0: 154 -1 156 -1 SIMD Lanes ---------- To investigate all active SIMD lanes at once, use the ``thread apply`` command: .. code-block:: thread apply 2.1 print element Example output: .. code-block:: Thread 2.1:0 (Thread 1.1073741824 lane 0): $4 = 100 You can specify a SIMD lane as a number: .. code-block:: thread apply 2.1:2 print element Example output: .. code-block:: Thread 2.1:2 (Thread 1.1073741824 lane 2): $5 = 102 You can also specify a SIMD lane as a range. In this case, only active SIMD lanes from the range are considered: .. code-block:: thread apply 2.1:2-5 print element Example output: .. code-block:: Thread 2.1:2 (Thread 1.1073741824 lane 2): $6 = 102 warning: SIMD lane 3 is inactive in thread 1.2 Thread 2.1:4 (Thread 1.1073741824 lane 4): $7 = 104 warning: SIMD lane 5 is inactive in thread 1.2 To denote all active SIMD lanes, use the wildcard: .. code-block:: thread apply 2.1:* print element Example output: .. code-block:: Thread 2.1:0 (Thread 1.1073741824 lane 0): $8 = 100 Thread 2.1:2 (Thread 1.1073741824 lane 2): $9 = 102 Thread 2.1:4 (Thread 1.1073741824 lane 4): $10 = 104 Thread 2.1:6 (Thread 1.1073741824 lane 6): $11 = 106 To apply the command to all active SIMD lanes of all threads, use ``all-lanes`` parameter: .. code-block:: thread apply all-lanes print element Example output: .. code-block:: Thread 2.8:7 (Thread 1.1073741888 lane 7): $12 = 155 Thread 2.8:5 (Thread 1.1073741888 lane 5): $13 = 153 [...] Thread 2.1:2 (Thread 1.1073741824 lane 2): $42 = 102 Thread 2.1:0 (Thread 1.1073741824 lane 0): $43 = 100 Thread 1.2 (Thread 0x7ffff26dc700 (LWP 30173) "array-transform"): No symbol "element" in current context. You can mix SIMD lane ranges with thread ranges and the thread wildcard. For example, to apply the command to all active lanes of all threads of inferior 2, you can use any of the following commands: - .. container:: :name: LI_2C74DC53D4A046639307FFD051AF4400 .. code-block:: thread apply 2.1-8:* - .. container:: :name: LI_DA6060DACA8547548C9FA1FEC45917F2 .. code-block:: thread apply 2.*:* If the current inferior is 2, the inferior number can be skipped: - .. container:: :name: LI_2D313E446D76413EAE6A9FF01CB050C0 .. code-block:: thread apply 1-8:* - .. container:: :name: LI_1E625D9D54894FB5BB0AA9877A19AFE2 .. code-block:: thread apply *:* Breakpoint Actions ------------------ You can define a set of actions for a breakpoint to be executed when the breakpoint is hit. By default, the actions are executed in the context of the SIMD lane selected after the hit. #. Quit the current debugging session and start a new one: .. code-block:: quit .. code-block:: gdb-oneapi array-transform #. Define two temporary breakpoints with actions for *if* and *else* branches: a. .. container:: :name: LI_C244219C5BC94B47A8BB3D81FF18873D i. Set a temporary breakpoint: .. code-block:: tbreak 61 Example output: .. code-block:: Temporary breakpoint 1 at 0x40584a: file /path/to/array-transform.cpp, line 61. ii. Define an action: .. code-block:: commands When you are asked to type commands, enter the following: .. code-block:: print element end b. .. container:: :name: LI_A419F5E2BAA148D69233E0D259DC73EE i. Set another temporary breakpoint: .. code-block:: tbreak 59 Example output: .. code-block:: Temporary breakpoint 2 at 0x40583c: file /path/to/array-transform.cpp, line 59. ii. Define an action to be executed for all SIMD lines by adding ``/a`` modifier: .. code-block:: commands /a When you are asked to type commands, enter the following: .. code-block:: print element end Start the program: .. code-block:: run gpu Example output: .. code-block:: [...] Thread 2.1 hit Temporary breakpoint 1, with SIMD lanes [1 3 5 7], main::$_1::operator()[...] at array-transform.cpp:61 61 result = -1; // else-branch $1 = 101 Continue to hit both breakpoints: .. code-block:: continue Example output: .. code-block:: Continuing. [Switching to SIMD lane 0] Thread 2.1 hit Temporary breakpoint 2, with SIMD lanes [0 2 4 6], main::$_1::operator()[...] at array-transform.cpp:59 59 result = result + 50; // then-branch $2 = 100 $3 = 102 $4 = 104 $5 = 106 The action for the breakpoint at the *else* branch was executed for a single SIMD lane 1, while the action at the *then* branch was executed for all active SIMD lanes. .. note:: For conditional breakpoints, the actions are executed only for SIMD lanes that meet the condition. Conditional Breakpoints ----------------------- Quit the debugging session and start the program from the beginning: .. code-block:: quit .. code-block:: gdb-oneapi array-transform This time set a breakpoint at line 59 with the condition ``element==106``: .. code-block:: break 59 if element == 106 Example output: .. code-block:: Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 59. Run the program (execute the ``run gpu`` command) and check if the output looks as follows: .. code-block:: Starting program: gpu [...] [Switching to Thread 1.1073741824 lane 6] Thread 2.1 hit Breakpoint 1, with SIMD lane 6, main::$_1::operator()[...] at array-transform.cpp:59 59 result = result + 50; // then-branch The condition is true for the lane 6 in thread 2.1. .. note:: A breakpoint condition is evaluated only for active SIMD lanes, meaning that ``(gdb) break 59 if element == 107`` does not cause a stop, since ``element == 107`` is true for the lane 7 in thread 2.1, and it is inactive at line 59. .. toctree:: :maxdepth: 4 multi-gpu-debugging known-issues