-
Notifications
You must be signed in to change notification settings - Fork 4
Home
TTV is C++ tensor-vector multiplication header-only library.
It provides free C++ template functions for parallel computing the mode-q
tensor-times-vector product, i.e. c[i,...,j] = a[i,...,k,...,j] * b[k]
where q
is the index position of k
. Simple examples of tensor-vector multiplications are the inner-product c = a[i] * b[i]
with q=1
and the matrix-vector multiplication c[i] = a[i,j] * b[j]
with q=2
.
The number of dimensions (order) p
and the dimensions n[r]
as well as a non-hierarchical storage format pi
of the tensors a
and c
can be chosen at runtime. The library is an extension of the boost/ublas tensor library containing the sequential version.
Please note that in future, this library might be part of boost/ublas.
All function implementations of this library are based on the Loops-Over-GEMM (LOG) approach and utilize high-performance GEMV
or DOT
routines of high-performance BLAS
such as OpenBLAS or Intel MKL without transposing the tensor. No auxiliary memory is needed. Implementation details and runtime behevior of the tensor-vector multiplication functions are described in the research paper article.
The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:
The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:
template<class value_t> // arithmetic element type
tensor<value_t> operator*(
tlib::tensor_view<value_t> const& A, // reference to a tensor of order p and the contraction mode 1<=q<=p
tlib::tensor <value_t> const& B) // tensor of order 1
The operator*(...)
calls the C-like low-level function with the parameters tlib::execution::blas
, tlib::slicing::large
and tlib::loop_fusion::all
.
The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:
template<
class value_t, // arithmetic element type
class execution_policy, // determines the type of execution
class slicing_policy, // determines the tensor subtensor type, i.e. the slicing method
class fusion_policy // determines the loop fusion policy
>
tensor<value_t> tensor_times_vector(
std::size_t const q, // contraction dimension with 1<=q<=p
tensor<value_t> const& A, // contiguously stored input tensor A of order p with shape na and layout pia
tensor<value_t> const& B, // contiguously stored input tensor B of order 1 with shape nb
execution_policy ep, // execution::seq, execution::par or execution::blas
slicing_policy sp, // slicing::small or slicing::large
fusion_policy fp, // loop_fusion::none, loop_fusion::outer or loop_fusion::all
)
This function calls calls low-level function template that is described below.
The C-like low-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:
template<
class value_t, // arithmetic element type
class size_t, // size type for the strides
class execution_policy, // determines the type of execution
class slicing_policy, // determines the tensor subtensor type, i.e. the slicing method
class fusion_policy // determines the loop fusion policy
>
void tensor_times_vector(
execution_policy ep, // execution::seq, execution::par or execution::blas
slicing_policy sp, // slicing::small or slicing::large
fusion_policy fp, // loop_fusion::none, loop_fusion::outer or loop_fusion::all
size_t const q, // contraction dimension with 1<=q<=p
size_t const p, // number of dimensions, i.e. order of the input tensor with 1<p
value_t const*const A, // pointer to the contiguously stored input tensor A of order p
size_t const*const na, // pointer to the shape tuple of A with na[r]>=1
size_t const*const wa, // pointer to the stride tuple of A computed w.r.t. na and pia
size_t const*const pia, // pointer to the layout (permutation) tuple of A
value_t const*const B, // pointer to the contiguously stored input tensor B of order 1
size_t const*const nb, // pointer to the shape tuple of B with nb[0]>=1
value_t *const C, // pointer to the contiguously stored output tensor C of order p-1
size_t const*const nc, // pointer to the shape tuple of C with nc[r]>=1
size_t const*const wc, // pointer to the stride tuple of A which should be computed w.r.t. nc and pic
size_t const*const pic // pointer to the layout (permutation) tuple of C which should be computed w.r.t. pia
)
Please use auxiliary functions in shape.h, strides.h, layout.h to compute shape, stride and layout tuples of the input and output tensor.
The class template tensor in tensor.h has the following simplified constructors:
template<class value_t>
class tensor
{
// ...
tensor(shape_t const& n, layout_t const& pi);
tensor(shape_t const& n);
// ...
};
where the stride tuples of the tensor
are automatically computed and verified. You can use the low-level interface if you want to use your own data structure.
The class template tensor_view in tensor.h only encapsulates the tensor class template and stores the contraction mode. It can only be used with the overloaded operator tensor operator*(tensor_view,tensor)
. The instantiation of a tensor_view
is done only with the following member of tensor
:
tensor_view<value_t> tensor<value_t>::operator(std::size_t contraction_mode):
#include <tlib/ttv.h>
int main()
{
auto A = tlib::tensor<float>( {4,3,2} );
auto B = tlib::tensor<float>( {3,1} );
auto C = A(2) * B;
}
sudo apt install libopenblas-* libomp5*
git clone https://github.com/bassoy/ttv.git
cd ttv
# -DUSE_OPENBLAS and -DUSE_INTELBLAS for fast execution
cd example
g++ -I../include/ -std=c++17 -Ofast interface1.cpp -o interface1 && ./interface1
g++ -I../include/ -std=c++17 -Ofast interface2.cpp -o interface2 && ./interface2
g++ -I../include/ -std=c++17 -Ofast interface3.cpp -o interface2 && ./interface3
You can also have a look at the test directory which contains unit tests for almost every function in this repository.