FFTW to Intel® Math Kernel Library Wrappers Technical User Notes
for FFTW3.x

Contents

Introduction
Wrappers Reference
     1. Wrappers for Using Plans
     2. Basic Interface Wrappers
     3. Advanced Interface Wrappers
     4. Guru Interface Wrappers
     5. Wisdom Wrappers
     6. Memory Allocation
Parallel Mode
Calling Wrappers from Fortran
Installation
     Creating a Wrapper Library
     Application Assembling
     Running Examples
Technical Support
Disclaimer and Legal Information

  

Introduction

This document describes a collection of wrappers that is the FFTW interfaces superstructure to be used for calling functions of the Intel® Math Kernel Library (Intel® MKL) Fourier transform (DFTI) or Trigonometric Transform (TT) interface. These wrappers correspond to the FFTW version 3.x and the Intel MKL versions 7.0 and later.

The purpose of this set of wrappers is to enable developers whose programs currently use FFTW to gain performance with the Intel MKL Fourier transforms without changing the program source code. The only change that is required is to modify the header file fftw3.h (see Creating a Wrapper Library). Because of differences between FFTW and Intel MKL DFTI/TT functionalities, there are a lot of restrictions on using wrappers instead of FTTW functions. Some FFTW functions have empty wrappers. However, many typical DFTs can be computed using these wrappers.

Please refer to the Intel MKL DFTI/TT documentation for better understanding the effects from the use of the wrappers.

Additional wrappers may be added in the future to extend FFTW functionality available with Intel MKL.

Wrappers Reference

The section provides a reference for FFTW C interface.

Each FFTW function has its own wrapper. Some of them, which are not expressly listed below, are empty and do nothing, but they are still needed to avoid link errors and satisfy the function calls.

Note that Intel MKL DFTI operates on float and double-precision data types and does not support the long-double data type used by the FFTW functions.

1. Wrappers for Using Plans

void fftw_execute(const fftw_plan plan);

void fftw_destroy_plan(const fftw_plan plan);

void fftwf_execute(const fftw_plan plan);

void fftwf_destroy_plan(const fftw_plan plan);

2. Basic Interface Wrappers

Wrappers for execution and plan destruction functions are listed in Wrappers for Using Plans.

2.1 Complex DFTs

fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);
fftw_plan fftw_plan_dft_2d(int nx, int ny, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);
fftw_plan fftw_plan_dft_3d(int nx, int ny, int nz, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);
fftw_plan fftw_plan_dft(int rank, const int *n, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);
fftwf_plan fftwf_plan_dft_1d(int n, fftwf_complex *in, fftwf_complex *out, int sign, unsigned flags);
fftwf_plan fftwf_plan_dft_2d(int nx, int ny, fftwf_complex *in, fftwf_complex *out, int sign, unsigned flags);
fftwf_plan fftwf_plan_dft_3d(int nx, int ny, int nz, fftwf_complex *in, fftwf_complex *out, int sign, unsigned flags);
fftwf_plan fftwf_plan_dft(int rank, const int *n, fftwf_complex *in, fftwf_complex *out, int sign, unsigned flags);

Argument restrictions. The same algorithm corresponds to all values of the flags parameter.

2.2 Real-Data DFTs

fftw_plan fftw_plan_dft_r2c(int rank, const int *n, double *in, fftw_complex *out, unsigned flags);
fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, unsigned flags);
fftw_plan fftw_plan_dft_r2c_2d(int nx, int ny, double *in, fftw_complex *out, unsigned flags);
fftw_plan fftw_plan_dft_r2c_3d(int nx, int ny, int nz, double *in, fftw_complex *out, unsigned flags);

fftw_plan fftw_plan_dft_c2r(int rank, const int *n, fftw_complex *in, double *out, unsigned flags);
fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out, unsigned flags);
fftw_plan fftw_plan_dft_c2r_2d(int nx, int ny, fftw_complex *in, double *out, unsigned flags);
fftw_plan fftw_plan_dft_c2r_3d(int nx, int ny, int nz, fftw_complex *in, double *out, unsigned flags);

fftwf_plan fftwf_plan_dft_r2c(int rank, const int *n, float *in, fftwf_complex *out, unsigned flags);
fftwf_plan fftwf_plan_dft_r2c_1d(int n, float *in, fftwf_complex *out, unsigned flags);
fftwf_plan fftwf_plan_dft_r2c_2d(int nx, int ny, float *in, fftwf_complex *out, unsigned flags);
fftwf_plan fftwf_plan_dft_r2c_3d(int nx, int ny, int nz, float *in, fftwf_complex *out, unsigned flags);

fftwf_plan fftwf_plan_dft_c2r(int rank, const int *n, fftwf_complex *in, float *out, unsigned flags);
fftwf_plan fftwf_plan_dft_c2r_1d(int n, fftwf_complex *in, float *out, unsigned flags);
fftwf_plan fftwf_plan_dft_c2r_2d(int nx, int ny, fftwf_complex *in, float *out, unsigned flags);
fftwf_plan fftwf_plan_dft_c2r_3d(int nx, int ny, int nz, fftwf_complex *in, float *out, unsigned flags);

Argument restrictions. The same algorithm corresponds to all values of the flags parameter.

2.3 Real-to-Real Transforms

Currently, only real 1D even/odd DFTs (cosine/sine transforms) are supported.

fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, fftw_r2r_kind kind, unsigned flags);
fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, const fftw_r2r_kind *kind, unsigned flags);

Argument restrictions and extension: 

3. Advanced Interface Wrappers

Wrappers for execution and plan destruction functions are listed in Wrappers for Using Plans.

3.1 Advanced Complex DFTs

fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany, fftw_complex *in, const int *inembed, int istride, int idist, fftw_complex *out, const int *onembed, int ostride, int odist, int sign, unsigned flags);
fftwf_plan fftwf_plan_many_dft(int rank, const int *n, int howmany, fftwf_complex *in, const int *inembed, int istride, int idist, fftwf_complex *out, const int *onembed, int ostride, int odist, int sign, unsigned flags);

Argument restrictions. The same algorithm corresponds to all values of the flags parameter.

3.2 Advanced Real-Data DFTs

fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany, double* in, const int *inembed, int istride, int idist, fftw_complex *out, const int *onembed, int ostride, int odist, unsigned flags);

fftwf_plan fftwf_plan_many_dft_r2c(int rank, const int *n, int howmany, float* in, const int *inembed, int istride, int idist, fftwf_complex *out, const int *onembed, int ostride, int odist, unsigned flags);

fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany, fftw_complex * in, const int *inembed, int istride, int idist, double *out, const int *onembed, int ostride, int odist, unsigned flags);

fftwf_plan fftwf_plan_many_dft_c2r(int rank, const int *n, int howmany, fftwf_complex* in, const int *inembed, int istride, int idist, float *out, const int *onembed, int ostride, int odist, unsigned flags);

3.3 Advanced Real-to-Real Transforms

All wrappers are empty and do nothing. The wrappers may be added in later versions of Intel MKL.

4. Guru Interface Wrappers

4.1 Guru Complex DFTs

fftw_plan fftw_plan_guru_dft(int rank, const fftw_iodim *dims, int howmany_rank, const fftw_iodim *howmany_dims, fftw_complex *in, fftw_complex *out, int sign, unsigned flags);
fftwf_plan fftwf_plan_guru_dft(int rank, const fftwf_iodim *dims, int howmany_rank, const fftwf_iodim *howmany_dims, fftwf_complex *in, fftwf_complex *out, int sign, unsigned flags);

Argument restrictions. The same algorithm corresponds to all values of the flags parameter. The only supported value of howmany_rank is 1.

The rest of the wrappers are empty and do nothing, as the Intel MKL DFTI currently does not support split arrays.

4.2 Guru Real-Data DFTs

All wrappers are empty and do nothing.

Real-data wrappers (without support of split arrays) may be added in later versions of Intel MKL. 

4.3 Guru Real-to-Real Transforms

All wrappers are empty and do nothing. The wrappers may be added in later versions of Intel MKL.

4.4 Guru Execution of Plans

void fftw_execute_dft(const fftw_plan p, fftw_complex *in, fftw_complex *out);
void fftw_execute_dft_r2c(const fftw_plan p, double *in, fftw_complex *out);
void fftw_execute_dft_c2r(const fftw_plan p, fftw_complex *in, double *out);
void fftwf_execute_dft(const fftwf_plan p, fftwf_complex *in, fftwf_complex *out);
void fftwf_execute_dft_r2c(const fftwf_plan p, float *in, fftwf_complex *out);
void fftwf_execute_dft_c2r(const fftwf_plan p, fftwf_complex *in, float *out);

The rest of the wrappers are empty and do nothing.

Real-data wrappers (without support of split arrays) will be added in later versions of Intel MKL. 

Wrappers for more execution and plan destruction functions are listed in Wrappers for Using Plans.

5. Wisdom Wrappers

All wrappers are empty and do nothing, as the Intel MKL DFTI currently does not support these functionalities.

6. Memory Allocation

void* fftw_malloc(size_t n);
void fftw_free(void* x);
void* fftwf_malloc(size_t n);
void fftwf_free(void* x);

Unlike the fftw_malloc and fftwf_malloc functions, the fftw_malloc and fftwf_malloc wrappers do not align the allocatable array. To do that, it is necessary to allocate extra memory and shift the array address for the DFT data. See also the Managing Performance and Memory chapter in the Intel MKL User's Guide (file userguide.pdf).

Parallel Mode

FFTW multi-threaded functions use the number of threads parameter, which the function fftw_threads_init defines and the function fftw_plan_with_nthreads sets.
However, the wrappers to these functions and the fftw_cleanup_threads wrapper are empty and do nothing, as the Intel MKL DFTI implements a different mechanism of parallelization. If you want to use Intel MKL DFTI routines in parallel mode or call wrappers from a multi-threaded application, please refer to the Intel MKL documentation to learn how to manage the number of threads.

Calling Wrappers from Fortran

Wrappers are available for all the Fortran FFTW functions. FFTW Fortran functions are actually the wrappers to FFTW C functions. Fortran wrappers are actually the wrappers to C wrappers. So their functionality and argument restrictions are the same as of the corresponding C wrappers.

DFFTW_EXECUTE(PLAN)

DFFTW_DESTROY_PLAN(PLAN)

SFFTW_EXECUTE(PLAN)

SFFTW_DESTROY_PLAN(PLAN)

DFFTW_PLAN_DFT_1D(PLAN, N, IN, OUT, SIGN, FLAGS)
DFFTW_PLAN_DFT_2D(PLAN, NX, NY, IN, OUT, SIGN, FLAGS)
DFFTW_PLAN_DFT_3D(PLAN, NX, NY, NZ, IN, OUT, SIGN, FLAGS)
DFFTW_PLAN_DFT(PLAN, RANK, N, IN, OUT, SIGN, FLAGS)

SFFTW_PLAN_DFT_1D(PLAN, N, IN, OUT, SIGN, FLAGS)
SFFTW_PLAN_DFT_2D(PLAN, NX, NY, IN, OUT, SIGN, FLAGS)
SFFTW_PLAN_DFT_3D(PLAN, NX, NY, NZ, IN, OUT, SIGN, FLAGS)
SFFTW_PLAN_DFT(PLAN, RANK, N, IN, OUT, SIGN, FLAGS)

DFFTW_PLAN_DFT_R2C(PLAN, RANK, N, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_R2C_1D(PLAN, N, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_R2C_2D(PLAN, NX, NY, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_R2C_3D(PLAN, NX, NY, NZ, IN, OUT, FLAGS)

DFFTW_PLAN_DFT_C2R(PLAN, RANK, N, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_C2R_1D(PLAN, N, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_C2R_2D(PLAN, NX, NY, IN, OUT, FLAGS)
DFFTW_PLAN_DFT_C2R_3D(PLAN, NX, NY, NZ, IN, OUT, FLAGS)

SFFTW_PLAN_DFT_R2C(PLAN, RANK, N, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_R2C_1D(PLAN, N, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_R2C_2D(PLAN, NX, NY, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_R2C_3D(PLAN, NX, NY, NZ, IN, OUT, FLAGS)

SFFTW_PLAN_DFT_C2R(PLAN, RANK, N, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_C2R_1D(PLAN, N, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_C2R_2D(PLAN, NX, NY, IN, OUT, FLAGS)
SFFTW_PLAN_DFT_C2R_3D(PLAN, NX, NY, NZ, IN, OUT, FLAGS)

DFFTW_PLAN_MANY_DFT(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)
SFFTW_PLAN_MANY_DFT(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)

DFFTW_PLAN_MANY_DFT_R2C(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)
SFFTW_PLAN_MANY_DFT_R2C(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)

DFFTW_PLAN_MANY_DFT_C2R(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)
SFFTW_PLAN_MANY_DFT_C2R(PLAN, RANK, N, HOWMANY, IN, INEMBED, ISTRIDE, IDIST, OUT, ONEMBED, OSTRIDE, ODIST, SIGN, FLAGS)

DFFTW_PLAN_GURU_DFT(PLAN, RANK, DIMS_N, DIMS_IS, DIMS_OS, HOWMANY_RANK, HOWMANY_DIMS_N, HOWMANY_DIMS_IS, HOWMANY_DIMS_OS, IN, OUT, SIGN, FLAGS)
SFFTW_PLAN_GURU_DFT(PLAN, RANK, DIMS_N, DIMS_IS, DIMS_OS, HOWMANY_RANK, HOWMANY_DIMS_N, HOWMANY_DIMS_IS, HOWMANY_DIMS_OS, IN, OUT, SIGN, FLAGS)

DFFTW_EXECUTE_DFT(PLAN, P, IN, OUT)
DFFTW_EXECUTE_DFT_R2C(PLAN, P, IN, OUT)
DFFTW_EXECUTE_DFT_C2R(PLAN, P, IN, OUT)

SFFTW_EXECUTE_DFT(PLAN, P, IN, OUT)
SFFTW_EXECUTE_DFT_R2C(PLAN, P, IN, OUT)
SFFTW_EXECUTE_DFT_C2R(PLAN, P, IN, OUT)

Installation

Wrappers are delivered as the source code, which must be compiled by a user to build the wrapper library. Then the FFTW library can be substituted by the wrapper and Intel MKL libraries. The source code for the wrappers and makefiles with the wrapper list files are located in the \interfaces\fftw3xc and \interfaces\fftw3xf sub-directories in the Intel MKL directory for C and Fortran wrappers, respectively. 

Creating a Wrapper Library

Two header files are used to compile the C wrapper library: fftw3_mkl.h and  fftw3.h.
The fftw3_mkl.h file is located in the \interfaces\fftw3xc\wrappers subdirectory in the Intel MKL directory.

Three header files are used to compile the Fortran wrapper library: fftw3_mkl.h, fftw3_f77_mkl.h, and fftw3.h.
The fftw3_mkl.h and fftw3_f77_mkl.h files are located in the \interfaces\fftw3xf\wrappers subdirectory in the Intel MKL directory.

The file fftw3.h, used to compile libraries for both interfaces and located in the \include\fftw subdirectory in the Intel MKL directory, slightly differs from the original FFTW (www.fftw.org) header file fftw3.h in that all rows containing calls to the fftw3.lib are commented.

As the Fortran wrapper library is built by a C compiler, function names in the wrapper library and Fortran object module may be different. The file fftw3_f77_mkl.h in the \interfaces\fftw3xf\wrappers subdirectory in the Intel MKL directory defines function names according to names in the Fortran module. If a required name is missing in the file, you can change the latter to add the name.

Makefiles contain the following parameters: platform (required), compiler, and function. Description of these parameters can be found in the makefile comment heading.

Examples

The command
     make lib64
builds a wrapper library for IA-64 architecture based applications using the Intel® C++ Compiler or Intel® Fortran Compiler version 8.0 or higher (Compliers are chosen by default.).
The command
     make lib64 compiler=gnu
builds a wrapper library for IA-64 architecture based applications using GNU C compiler.

As a result of a makefile execution, the wrapper library will be created in the directory with Intel MKL libraries corresponding to the used platform. For example, \lib\64 or \ia32\lib.

In the wrapper library names, the suffix corresponds to the used compiler and the underscore is preceded with letter "f" for Fortran and "c" for C.
For example,
     fftw3xf_intel.lib (Windows*)              libfftw3xf_intel.a (Linux* and Mac OS* X)
     fftw3xc_intel.lib (Windows)               libfftw3xc_intel.a (Linux and Mac OS X)
     fftw3xc_ms.lib (Windows)                    libfftw3xc_gnu.a (Linux and Mac OS X).

Application Assembling

The adapted fftw3.h header file (see above) should be used when you build C applications.
The native fftw3.h header file should be used when you build Fortran applications.

Running Examples

There are some examples that demonstrate how to use the wrapper library. The source code for the examples, makefiles used to run them, and the example list files are located in the \examples\fftw3xc and \examples\fftw3xf subdirectories in the Intel MKL directory. To build Fortran examples, one additional file fftw3.f is needed. This file is distributed with permission from FFTW and is available in the \include\fftw subdirectory of the Intel MKL directory. The original file can also be found in FFTW 3.1 at http://www.fftw.org/download.html.

Example makefile parameters are the same as wrapper library makefile parameters. Example makefiles normally invoke examples. However, if the appropriate wrapper library is not yet created, the makefile will first build it in the same way as the wrapper library makefile does and then proceed to examples.

If the parameter function=<example_name> is defined, then only the specified example will run. Otherwise, all examples from the appropriate subdirectory will run. The subdirectory \_results will be created, and the results will be stored there in the example_name.res files.

Technical Support

Please see the Intel MKL support website at http://www.intel.com/support/performancetools/libraries/mkl/.  

 

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.