Release v0.9.0 · NVIDIA/MatX

Version v0.9.0 adds comprehensive support for more host CPU transforms such as BLAS and LAPACK, including multi-threaded versions.

Beyond the CPU support, there are many more minor improvements:

Added several new operators include vector_norm, matrix_norm, frexp, diag, and more
Many compiler fixes to support a wider range of older and newer compilers
Performance improvements to avoid overhead of permutation operators when unnecessary
Much more!

A full changelist is below

What's Changed

Update pybyind to v2.12.0. Fixes issue #591. by @tmartin-gh in #604
Change print macro to matx namespaced function by @tmartin-gh in #607
Added frexp() operator by @cliffburdick in #609
Disable CUTLASS compile option by @cliffburdick in #610
Created dimensionless versions of ones() and zeros() by @cliffburdick in #611
Add smem-based polyphase channelizer kernel by @tbensonatl in #613
Eigen guide by @tylera-nvidia in #612
Multithreaded docs build Fix by @tylera-nvidia in #614
Fixed issues with static tensor unit tests compiling by @cliffburdick in #615
Implement csqrt by @tylera-nvidia in #619
Automatic Enumeration of NVTX Range IDs by @tylera-nvidia in #616
Fixing Clang errors to compile with clang-17 by @cliffburdick in #621
Update to CCCL 2.4.0 and fix CMake to not use system includes by @cliffburdick in #623
Remove options that nvc++ doesn't support by @cliffburdick in #624
Fixing some warnings on certain compilers by @cliffburdick in #625
More nvc++ warning fixes. Increase minimum supported CUDA to 11.5 by @cliffburdick in #627
More nvc++ fixes + code coverage generation by @cliffburdick in #628
fixed printing 0D tensors by @tylera-nvidia in #618
Remove conversion for double to half by @cliffburdick in #631
Add NVTX Tests for Code Coverage by @tylera-nvidia in #632
Feature/add complex cast operators by @tbensonatl in #633
Avoid array indices passthrough in matxOpTDKernel by @tbensonatl in #634
Add mixed precision support for channelize_poly by @tbensonatl in #640
Add test cases for stride kernels by @cliffburdick in #641
Basic synchronization support with sync() by @aayushg55 in #642
Converting old std:: types to cuda::std:: types by @cliffburdick in #629
Fix pybind iterator bug on newer g++ by @cliffburdick in #643
Initialize NVTX variable by @cliffburdick in #644
Fixed remaining nvc++ warnings by @cliffburdick in #645
Change cmake option/project order by @raplonu in #649
Change check on build type to avoid short circuiting by @cliffburdick in #647
Add complex cast operators for split inputs by @tbensonatl in #650
Added norm() operator by @cliffburdick in #620
Add zero-copy interface from MatX to NumPy by @cliffburdick in #653
Added host multithreading support for FFTW by @aayushg55 in #652
Fixed OpenMP compiler flags by @aayushg55 in #654
Fixed issue with operator types used as both lvalue/rvalue not assigning by @cliffburdick in #655
Smaller FFT test sizes for faster CI/CD by @aayushg55 in #656
Docs for matrix/vector norm by @cliffburdick in #657
Change matmul to use tensor_t temp until issue with impl is fixed by @cliffburdick in #658
Added plan caching for FFTW host plans by @aayushg55 in #659
Fixed fftw guards and temp allocation by @aayushg55 in #660
Fixed fftw guards to be fine-grained by @aayushg55 in #661
Enabled FFT conv for host by @aayushg55 in #662
NVPL BLAS Support by @aayushg55 in #665
Change supported CUDA to 11.8 by @cliffburdick in #670
enh: add macro to define cuda functions accessible at global scope by @mfzmullen in #668
Add workaround for pre-11.8 CTK smem init errors by @tbensonatl in #673
Fix to ConvCorr tests to skip host tests when host not enabled by @aayushg55 in #674
Expanded Host BLAS support by @aayushg55 in #675
Update README.md by @HugoPhibbs in #676
Improved the error messages when sizes are incompatible by @cliffburdick in #682
Added toeplitz operator by @cliffburdick in #683
Simplified cmake file so no definitions are required by default by @cliffburdick in #684
fix type for permuted ops in norm. by @luitjens in #696
Fix c++20 warning by @cliffburdick in #698
Update Cub Cache Creation to new Method by @tylera-nvidia in #694
Fixed base operator types by @cliffburdick in #703
Update slice.rst by @HugoPhibbs in #704
Fixed issues with host compiler with C++17 and C++20 modes by @cliffburdick in #706
NVPL LAPACK Solver Support on ARM by @aayushg55 in #701
Add detail:: namespace to CUB struct by @cliffburdick in #708
OpenBLAS LAPACK Solver Support for x86 by @aayushg55 in #709
Exclude examples/cmake_sample_project/build* from doxygen search by @tmartin-gh in #711
Fixed random pre/post run signature by @cliffburdick in #715
Rapids cmake 24 06 package by @cliffburdick in #716
Add support for UINT Generation by @tylera-nvidia in #695
Update svd docstring by @cliffburdick in #717
Solver SVD Optimizations and Improved cuSolver batching by @aayushg55 in #721
MATX_EN_CUTENSOR / MATX_ENABLE_CUTENSOR Unified Variable by @tylera-nvidia in #720
mtie should output the correct rank and size for the output operator. by @luitjens in #726
Update bug_report.md by @HugoPhibbs in #729
eliminate auto spills in permute by @luitjens in #731
Revert accidental commit to main by @cliffburdick in #734
Host Solver workspace query fix by @aayushg55 in #733
Add in-place transform support for inv() by @tbensonatl in #736
Allow access to Data() pointer from device by @tmartin-gh in #738
Use cublasmatinvBatched() for N <= 32 by @tbensonatl in #739
Added new pinv() operator and updated Reduced SVD by @aayushg55 in #740
optimize our iterator to avoid an unnecessary constructor call by @luitjens in #741
Updated Solver documentation by @aayushg55 in #742
Updated documentation for CPU support by @aayushg55 in #743
Slice optimizations to reduce spills by @cliffburdick in #732
Fixing shadow declaration by @cliffburdick in #745
Workaround for constexpr bug inside lambda in CUDA 11.8 by @cliffburdick in #671
Added diag operator taking 1D operator to generate 2D operator by @cliffburdick in #746
Add normcdf docs by @cliffburdick in #747
Refactor template arguments to reductions to force no permutes when unnecessary by @cliffburdick in #749
Adding workarounds for false positives on gcc14 by @cliffburdick in #751
Visibility fix for cache static deinit issue by @nvjonwong in #752
Don't allow in-place make_tensor to change ownership by @cliffburdick in #753
Fix for erroneous errors on gcc14.1 by @cliffburdick in #755
Create temp contiguous tensors if needed for sort by @tbensonatl in #757
Fix regression in slice by @cliffburdick in #758
Allow printing const pointers by @cliffburdick in #761
Switch CMake warnings by author warnings to allow user to disable them by @jjomier in #754
Major refactoring of the code to better handle tensor_t usage by @cliffburdick in #756
Fixing sort allocation by @cliffburdick in #764
Added print_shape for printing shape of operators by @cliffburdick in #763

New Contributors

@aayushg55 made their first contribution in #642
@raplonu made their first contribution in #649
@mfzmullen made their first contribution in #668
@jjomier made their first contribution in #754

Full Changelog: v0.8.0...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.0

What's Changed

New Contributors

Contributors