Skip to content
Cem Bassoy edited this page Jul 20, 2019 · 10 revisions

General

TTV is C++ tensor-vector multiplication header-only library. It provides free C++ template functions for parallel computing the mode-q tensor-times-vector product, i.e. c[i,...,j] = a[i,...,k,...,j] * b[k] where q is the index position of k. Simple examples of tensor-vector multiplications are the inner-product c = a[i] * b[i] with q=1 and the matrix-vector multiplication c[i] = a[i,j] * b[j] with q=2.

The number of dimensions (order) p and the dimensions n[r] as well as a non-hierarchical storage format pi of the tensors a and c can be chosen at runtime. The library is an extension of the boost/ublas tensor library containing the sequential version. Please note that in future, this library might be part of boost/ublas.

All function implementations of this library are based on the Loops-Over-GEMM (LOG) approach and utilize high-performance GEMV or DOT routines of high-performance BLAS such as OpenBLAS or Intel MKL without transposing the tensor. No auxiliary memory is needed. Implementation details and runtime behevior of the tensor-vector multiplication functions are described in the research paper article.

Interfaces

The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:

Interface (1)

The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:

template<class value_t>  // arithmetic element type
tensor<value_t> operator*(
  tlib::tensor_view<value_t> const& A, // reference to a tensor of order p and the contraction mode 1<=q<=p
  tlib::tensor     <value_t> const& B) // tensor of order 1

The operator*(...) calls the C-like low-level function with the parameters tlib::execution::blas, tlib::slicing::large and tlib::loop_fusion::all.

Interface (2)

The high-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:

template<
  class value_t,          // arithmetic element type
  class execution_policy, // determines the type of execution 
  class slicing_policy,   // determines the tensor subtensor type, i.e. the slicing method
  class fusion_policy     // determines the loop fusion policy 
>
tensor<value_t> tensor_times_vector( 
  std::size_t     const  q,   // contraction dimension with 1<=q<=p
  tensor<value_t> const& A,   // contiguously stored input tensor A of order p with shape na and layout pia
  tensor<value_t> const& B,   // contiguously stored input tensor B of order 1 with shape nb
  execution_policy      ep,   // execution::seq, execution::par or execution::blas
  slicing_policy        sp,   // slicing::small or slicing::large
  fusion_policy         fp,   // loop_fusion::none, loop_fusion::outer or loop_fusion::all
)

This function calls calls low-level function template that is described below.

Interface (3)

The C-like low-level template function of the high-performance tensor-times-vector functions in ttv.h has the following signature:

template<
  class value_t,          // arithmetic element type
  class size_t,           // size type for the strides 
  class execution_policy, // determines the type of execution 
  class slicing_policy,   // determines the tensor subtensor type, i.e. the slicing method
  class fusion_policy     // determines the loop fusion policy 
>
void tensor_times_vector( 
  execution_policy ep,     // execution::seq, execution::par or execution::blas
  slicing_policy   sp,     // slicing::small or slicing::large
  fusion_policy    fp,     // loop_fusion::none, loop_fusion::outer or loop_fusion::all
  size_t  const q,         // contraction dimension with 1<=q<=p
  size_t  const p,         // number of dimensions, i.e. order of the input tensor with 1<p
  value_t const*const A,   // pointer to the contiguously stored input tensor A of order p
  size_t  const*const na,  // pointer to the shape tuple of A with na[r]>=1
  size_t  const*const wa,  // pointer to the stride tuple of A computed w.r.t. na and pia
  size_t  const*const pia, // pointer to the layout (permutation) tuple of A
  value_t const*const B,   // pointer to the contiguously stored input tensor B of order 1
  size_t  const*const nb,  // pointer to the shape tuple of B with nb[0]>=1
  value_t      *const C,   // pointer to the contiguously stored output tensor C of order p-1
  size_t  const*const nc,  // pointer to the shape tuple of C with nc[r]>=1
  size_t  const*const wc,  // pointer to the stride tuple of A which should be computed w.r.t. nc and pic
  size_t  const*const pic  // pointer to the layout (permutation) tuple of C which should be computed w.r.t. pia
)

Please use auxiliary functions in shape.h, strides.h, layout.h to compute shape, stride and layout tuples of the input and output tensor.

Auxiliary Data Structures

The class template tensor in tensor.h has the following simplified constructors:

template<class value_t>
class tensor
{
  // ...
  tensor(shape_t const& n, layout_t const& pi);
  tensor(shape_t const& n);
  // ...
};

where the stride tuples of the tensor are automatically computed and verified. You can use the low-level interface if you want to use your own data structure.

The class template tensor_view in tensor.h only encapsulates the tensor class template and stores the contraction mode. It can only be used with the overloaded operator tensor operator*(tensor_view,tensor). The instantiation of a tensor_view is done only with the following member of tensor:

tensor_view<value_t> tensor<value_t>::operator(std::size_t contraction_mode):

Minimal Example

#include <tlib/ttv.h>
int main()
{
  auto A = tlib::tensor<float>( {4,3,2} );
  auto B = tlib::tensor<float>( {3,1} ); 
  auto C = A(2) * B;
}

Usage

Installing OpenBLAS and OpenMP on Ubuntu 18.04

sudo apt install libopenblas-* libomp5*

Clone from Github

git clone https://github.com/bassoy/ttv.git
cd ttv

Compile and Execute Example Files

# -DUSE_OPENBLAS and -DUSE_INTELBLAS for fast execution
cd example
g++ -I../include/ -std=c++17 -Ofast interface1.cpp -o interface1 && ./interface1 
g++ -I../include/ -std=c++17 -Ofast interface2.cpp -o interface2 && ./interface2
g++ -I../include/ -std=c++17 -Ofast interface3.cpp -o interface2 && ./interface3

You can also have a look at the test directory which contains unit tests for almost every function in this repository.