.. _omatadd_batch: omatadd_batch ============== Computes a group of out-of-place scaled matrix additions using general matrices. Description *********** The ``omatadd_batch`` routines perform a series of out-of-place scaled matrix additions. They are similar to the ``omatadd`` routines, but the ``omatadd_batch`` routines perform matrix operations with a group of matrices. The matrices are always in a strided format for this API. The operation is defined as: .. code-block:: for i = 0 … batch_size – 1 A is a matrix at offset i * stridea in a B is a matrix at offset i * strideb in b C is a matrix at offset i * stridec in c C = alpha * op(A) + beta * op(B) end for where: - ``op(X)`` is one of ``op(X) = X``, ``op(X) = X'``, or ``op(X) = conjg(X')`` - ``alpha`` and ``beta`` are scalars - A, B and C are matrices The API is available with USM pointers or buffer arguments for the input and output arrays. The input buffers or arrays ``a`` and ``b`` contain all the input matrices, and the single output buffer or array ``c`` contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the ``batch_size`` parameter. API *** Syntax ------ USM arrays: .. code-block:: event omatadd_batch(queue &queue, transpose transa, transpose transb, std::int64_t m, std::int64_t n, T alpha, const T *a, std::int64_t lda, std::int64_t stride_a, T beta, T *b, std::int64_t ldb, std::int64_t stride_b, T *c, std::int64_t ldc, std::int64_t stride_c, std::int64_t batch_size, const std::vector &dependencies = {}); Buffer arrays: .. code-block:: void omatadd_batch(queue &queue, transpose transa, transpose transb, std::int64_t m, std::int64_t n, T alpha, cl::sycl::buffer &a, std::int64_t lda, std::int64_t stride_a, T beta, cl::sycl::buffer &b, std::int64_t ldb, std::int64_t stride_b, cl::sycl::buffer &c, std::int64_t ldc, std::int64_t stride_c, std::int64_t batch_size); ``omatadd_batch`` supports the following precisions and devices: .. list-table:: :header-rows: 1 * - T - Devices Supported * - ``float`` - Host, CPU, and GPU * - ``double`` - Host, CPU, and GPU * - ``std::complex`` - Host, CPU, and GPU * - ``std::complex`` - Host, CPU, and GPU Input Parameters ---------------- transa Specifies ``op(A)``, the transposition operation applied to the matrices A. transb Specifies ``op(B)``, the transposition operation applied to the matrices B. m Number of rows for the result matrix C. Must be at least zero. n Number of columns for the result matrix C. Must be at least zero. alpha Scaling factor for the matrices A. a Buffer or array holding the input matrices A. Must have size at least ``stride_a*batch_size``. lda Leading dimension of the A matrices. If matrices are stored using column major layout, ``lda`` must be at least ``m`` if A is not transposed or ``n`` if A is transposed. If matrices are stored using row major layout, ``lda`` must be at least ``n`` if A is not transposed or at least ``m`` if A is transposed. Must be positive. stride_a Stride between the different A matrices. If matrices are stored using column major layout, ``stride_a`` must be at least ``lda*n`` if A is not transposed or at least ``lda*m`` if A is transposed. If matrices are stored using row major layout, ``stride_a`` must be at least ``lda*m`` if B is not transposed or at least ``lda*n`` if A is transposed. beta Scaling factor for the matrices B. b Buffer or array holding the input matrices B. Must have size at least ``stride_b*batch_size``. ldb Leading dimension of the B matrices. If matrices are stored using column major layout, ``ldb`` must be at least ``m`` if B is not transposed or ``n`` if B is transposed. If matrices are stored using row major layout, ``ldb`` must be at least ``n`` if B is not transposed or at least ``m`` if B is transposed. Must be positive. stride_b Stride between the different B matrices. If matrices are stored using column major layout, ``stride_b`` must be at least ``ldb*n`` if B is not transposed or at least ``ldb*m`` if B is transposed. If matrices are stored using row major layout, ``stride_b`` must be at least ``ldb*m`` if B is not transposed or at least ``ldb*n`` if B is transposed. ldc Leading dimension of the A matrices. If matrices are stored using column major layout, ``lda`` must be at least ``m``. If matrices are stored using row major layout, ``lda`` must be at least ``n``. Must be positive. stride_c Stride between the different C matrices. If matrices are stored using column major layout, ``stride_c`` must be at least ``ldc*n``. If matrices are stored using row major layout, ``stride_c`` must be at least ``ldc*m``. batch_size Specifies the number of input and output matrices to add. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. Output Parameters ----------------- c Output buffer or array, overwritten by ``batch_size`` matrix addition operations of the form ``alpha*op(A) + beta*op(B)``. Must have size at least ``stride_c*batch_size``.