ReLAPACK has two configuration files: make.inc
, which is included by the
Makefile, and config.h
which is included in the source files.
The build environment (compiler and flags) and the test configuration (linker
flags for BLAS and LAPACK) are specified in make.inc
. The test matrix size
and error bounds are defined in test/config.h
.
The library librelapack.a
is compiled by invoking make
. The tests are
performed by either make test
or calling make
in the test folder.
For BLAS and LAPACK functions that return a complex number, there exist two
conflicting (FORTRAN compiler dependent) calling conventions: either the result
is returned as a struct
of two floating point numbers or an additional first
argument with a pointer to such a struct
is used. By default ReLAPACK uses
the former (which is what gfortran uses), but it can switch to the latter by
setting COMPLEX_FUNCTIONS_AS_ROUTINES
(or explicitly the BLAS and LAPACK
specific counterparts) to 1
in config.h
.
For MKL, COMPLEX_FUNCTIONS_AS_ROUTINES
must be set to 1
.
(Using the wrong convention will break ctrsyl
and ztrsyl
and the test cases
will segfault or return errors on the order of 1 or larger.)
The LDL decompositions require a general matrix-matrix product that updates only
a triangular matrix called xgemmt
. If the BLAS implementation linked against
provides such a routine, set the flag HAVE_XGEMMT
to 1
in config.h
;
otherwise, ReLAPACK uses its own recursive implementation of these kernels.
xgemmt
is provided by MKL.
ReLAPACK's routines are named RELAPACK_X
(e.g., RELAPACK_dgetrf
). If the
corresponding INCLUDE_X
flag in config.h
(e.g., INCLUDE_DGETRF
) is set to
1
, ReLAPACK additionally provides a wrapper under the LAPACK name (e.g.,
dgetrf_
). By default, wrappers for all routines are enabled.
The crossover size determines below which matrix sizes ReLAPACK's recursive
algorithms switch to LAPACK's unblocked routines to avoid tiny BLAS Level 3
routines. The crossover size is set in config.h
and can be chosen either
globally for the entire library, by operation, or individually by routine.
Two of ReLAPACK's routines make use of temporary buffers, which are allocated
and freed within ReLAPACK. Setting ALLOW_MALLOC
(or one of the routine
specific counterparts) to 0 in config.h
will disable these buffers. The
affected routines are:
-
xsytrf
: The LDL decomposition requires a buffer of size n^2 / 2. As in LAPACK, this size can be queried by settinglWork = -1
and the passed buffer will be used if it is large enough; only if it is not, a local buffer will be allocated.The advantage of this mechanism is that ReLAPACK will seamlessly work even with codes that statically provide too little memory instead of breaking them.
-
xsygst
: The reduction of a real symmetric-definite generalized eigenproblem to standard form can use an auxiliary buffer of size n^2 / 2 to avoid redundant computations. It thereby performs about 30% less FLOPs than LAPACK.
ReLAPACK is commonly linked to BLAS and LAPACK with standard FORTRAN interfaces.
Since these libraries usually have an underscore to their symbol names, ReLAPACK
has configuration switches in config.h
to adjust the corresponding routine
names.