This page summarizes the state of SYCL feature support in the current develop
branch of AdaptiveCpp. Features that are supported are listed with a link to the pull request where they have been merged.
(This list is incomplete and only contains features that are known to be problematic)
Feature | Supported (PR link) | Caveats | Comments |
---|---|---|---|
Images | ❌ | --- | --- |
OpenCL interop | ❌ | --- | --- |
Hierarchical parallelism | ✔️ | HIP/CUDA: Does not limit execution in work group scope to one thread for performance reasons |
Feature | Supported (PR link) | Caveats | Comments |
---|---|---|---|
Accessor simplifications | ✔️ (partial) (PR) | [6] | |
USM: Memory management functions | ✔️ (PR) | [1] | |
USM: Queue shortcuts | ✔️ (PR) | ||
USM: Prefetch | ✔️ (PR) | [2] | |
USM: mem_advise | ❌ | Implementation requires host tasks since backends do not provide async mem advise | |
USM: memcpy | ✔️ (PR) | ||
USM: memset/fill | ✔️ (PR) | ||
host tasks | ❌ | ||
Optional lambda naming | ✔️ (PR) | ||
Subgroups | ✔️ (PR) | On CPU, subgroup size is always 1 | |
In-order queues | ✔️ (PR) | ||
Explicit dependencies (depends_on() ) |
✔️ (PR) | ||
Backend interop API | ✔️ (PR) | [3] | |
Reductions | ✔️ (PR) | [4] | |
Group algorithms | ✔️ (PR) | [5] | |
New device selector API | ✔️ (PR) | ||
Aspect API | ✔️ (PR) | ||
Deduction guides | ✔️ (PR) | ||
atomic_ref |
✔️ (PR) | ||
marray |
❌ | ||
New SYCL/sycl.hpp header |
✔️ (PR) | ||
C++17 by default | ✔️ (PR) | ||
Builtin changes: ctz() , clz() |
❌ | ||
Remove *_class types |
❌ | ||
const return type for read accessor operator[] |
❌ | ||
Remove buffer API for unique_ptr |
❌ | ||
Replace program class with module |
❌ | ||
Add kernel_handler |
❌ | ||
explicit queue , context constructors |
✔️ (PR) | ||
Only require C++ trivially copyable for shared data | ✔️ | Has always worked thanks to CUDA/HIP toolchain | |
Update group class with new types/member functions | ❌ | ||
Remove nd_item::barrier() |
❌ | ||
Replace mem_fence with atomic_fence |
❌ | ||
Add vec::operator[] ,unary +,- , static constexpr get_size()/get_count() |
✔️ (PR) | ||
buffer, local accessor are C++ ContiguousContainer |
❌ | ||
Replace image with sampled_image , unsampled_image |
❌ | ||
All accessors are placeholders | ✔️ (PR) | ||
Use single exception type derived from std::exception |
❌ | ||
Default asynchronous handler should terminate program | ✔️ (PR) | ||
Kernel invocation APIs take const reference to kernels, kernels must be immutable | ❌ | ||
Queue constructor accepting both device and context |
✔️ (PR) | ||
Simplified parallel_for API |
❌ | ||
Clarified names for device specific info queries | ❌ | ||
Address space changes, generic address spaces | ❌ | Partially, we have always had generic address spaces because of CUDA/HIP | |
Updated multi_ptr interface |
❌ | ||
Remove OpenCL types, cl_int etc |
✔️ | hipSYCL has stopped supporting them a long time ago |
- [1] HIP/ROCm implements unified memory using slow device accessible host memory. This means that hipSYCL's call to
hipMallocManaged
cannot produce efficient shared allocations. - [2] HIP/ROCm does not provide the required functionality, so hipSYCL cannot expose it. Prefetch calls are ignored at the moment.
- [3] The interop types that backends expose is limited. Native queues can only be obtained using an
interop_handle
because a queue in hipSYCL does not relate to any specific backend object. - [4] Only scalar reductions are supported. Note that the reduction interface is expected to change slightly with the release of SYCL 2020 final.
- [5] Note that the interface of group algorithms is expected to change slightly with the release of SYCL 2020 final.
- [6] Constructing read-only accessor using
accessor<const T>
from non-constbuffer<T>
is not yet supported.