Releases · ddemidov/vexcl

05 Dec 18:25

ddemidov

1.1.1

df91688

1.1.1

Sorting algorithms may take tuples of keys/values (in fact, any Boost.Fusion sequence will do). One will have to explicitly specify the comparison functor in this case. Both host and device variants of the comparison functor should take 2n arguments, where n is the number of keys. The first n arguments correspond to the left set of keys, and the second n arguments correspond to the right set of keys. Here is an example that sorts values by a tuple of two keys:

vex::vector<int>    keys1(ctx, n);
vex::vector<float>  keys2(ctx, n);
vex::vector<double> vals (ctx, n);

struct {
    VEX_FUNCTION(device, bool(int, float, int, float),
            "return (prm1 == prm3) ? (prm2 < prm4) : (prm1 < prm3);"
            );
    bool operator()(int a1, float a2, int b1, float b2) const {
        return std::make_tuple(a1, a2) < std::tuple(b1, b2);
    }
} comp;

vex::sort_by_key(std::tie(keys1, keys2), vals, comp);

Assets 2

29 Nov 19:13

ddemidov

1.1.0

f70f81e

1.1.0

vex::SpMat<>class uses CUSPARSE library on CUDA backend when VEXCL_USE_CUSPARSE macro is defined. This results in more effective sparse matrix-vector product, but disables inlining of SpMV operation.
Provided an example of CUDA backend interoperation with Thrust.
When VEXCL_CHECK_SIZES macro is defined to 1 or 2, then runtime checks for vector
expression correctness are enabled (see #81, #82).
Added sort() and sort_by_key() functions.
Added inclusive_scan() and exclusive_scan() functions.
Added reduce_by_key() function. Only works with single-device contexts.
Added convert_<type>() and as_<type>() builtin functions for OpenCL backend.

Assets 2

15 Nov 11:13

ddemidov

1.0.0

5689b34

1.0.0

CUDA backend is added!

As of v1.0.0, VexCL provides two backends: OpenCL and CUDA. In order to choose either of those, user has to define VEXCL_BACKEND_OPENCL or VEXCL_BACKEND_CUDA macros. In case neither of those are defined, OpenCL backend is chosen by default. One also has to link to either libOpenCL.so (OpenCL.dll for Windows users) or libcuda.so (cuda.dll).

For the CUDA backend to work, CUDA Toolkit has to be installed, NVIDIA CUDA compiler driver nvcc has to be in executable PATH and usable at runtime.

Benchmarks show that the CUDA backend is a couple of percents more efficient than the OpenCL backend, except for matrix-vector multiplication on multiple devices (there are some issues with asynchronous memory transfer with CUDA driver API). Note that first run of a program will take longer than usual, because there will be several invocations of nvcc compiler to compile each of compute kernels used in the program. Second and other runs will use offline kernel cache and will complete faster.

Also:

Added vex::Filter::General: modifiable container for device filters.
vex::Filter::Env supports OCL_POSITION environment variable.
Vector views (reduction, permutation) are all working with vector expressions.
Added vex::reshape() function for reshaping of multidimensional expressions.
Added vex::cast() function for changing deduced type of an expression.
Added vex::Filter::Extension and vex::Filter::GLSharing filters for the OpenCL backend (thanks, @johneih!)
VEXCL_SPLIT_MULTIEXPRESSIONS macro allows componentwise splitting of
large multiexpressions.
Various bug fixes.

Assets 2

09 Oct 06:39

ddemidov

0.8.5

53436e6

0.8.5

Sparse matrix-vector product for OpenCL vector types:

    vex::SpMat <cl_double2> A;
    vex::vector<cl_double2> x, y;
    y = A * x;

Added raw_pointer() function. See 'Raw pointers' section in README.
Fixed compilation for 32bit Visual Studio (thanks to @d-meiser for
reporting!).
Other bug fixes.

Assets 2

30 Sep 06:27

ddemidov

0.8.4-r1

52f2c53

0.8.4-r1

Various bug fixes

Assets 2

20 Sep 12:13

ddemidov

0.8.4

31ccbdd

0.8.4

Allow user-defined functions in symbolic expressions
Introduced address-of and dereference operators in vector expressions
This makes the following possible:

/*
 * Assign 42 to either y or z, depending on value of x. The trick with
 * address_of/dereference is unfortunately required because in C99 (which
 * OpenCL is based on) result of ternary operator is not an lvalue.
 */
vex::tie( *if_else( x < 0.5 ? &y : &z) ) = 42;

vex::reduce() accepts slices of vector expressions. vex::reduce() calls may
be nested.
vex::element_index() optionally accepts length (number of elements). This
allows to reduce stateless vector expressions. Could be useful e.g. for
Monte-Carlo experiments.
Added missing builtin functions.
Introduced constants in vector expressions. Instances of
std::integral_constant<T,v> or constants from vex::constants namespace (which
are currently wrappers for boost::math::constants) will not be passed as kernel
parameters, but will be written as literals to kernel source. Users may introduce their own constants with help of VEX_CONSTANT macro.

Assets 2

14 Sep 16:44

ddemidov

0.8.3

2203c5d

0.8.3

FFT transform may be used as if it was first-class vector expression, and not just an additive transform (#54). This does not mean that expressions involving FFTs will result in single kernel launch.
Allow to purge online kernel caches. This allows for complete cleanup of OpenCL contexts. Should be useful for libraries.
Offline kernel caching. Saves time on first-time compilation. See comments in 1aedcd2.

Assets 2

11 Sep 17:34

ddemidov

0.8.2

b31bbb4

0.8.2

Replaced vector-based permutation class with expression-based
permutation function.
Bug fixes.
Restored compilation under Visual Studio.

Assets 2

09 Sep 18:16

ddemidov

0.8.1

d78879e

0.8.1

Type deduction for OpenCL vector types (float2, double4 etc.)
Expression-based permutation operator.

Assets 2

09 Sep 09:26

ddemidov

0.8.0

37113c3

0.8.0

API changes:

There are no more non-owning multivectors. multivector class has only
two template parameters now: type and number of components.
vex::tie() now returns vex::expression_tuple instead of non-owning
multivector. This allows to tie vectors of different types or even
writable expressions (e.g. slices) together.
Order of vex::make_temp<> template parameters has changed
(first required Tag, then optional Type). The type, when unspecified,
is deduced automatically from the given expression.
MPI support is dropped (moved to 'mpi' branch).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ddemidov/vexcl

1.1.1

1.1.0

1.0.0

0.8.5

0.8.4-r1

0.8.4

0.8.3

0.8.2

0.8.1

0.8.0