v0.12.0 - Adds f16 dtype
Breaking changes
- [Breaking] Adding Tensor::try_realize, and Tensor::realize no longer returns Result by @coreylowman in #758
- [Breaking] ReshapeTo::reshape_like and ReshapeTo::try_reshape_like now panic instead of returning option by @coreylowman in #766
- [Breaking] Adding dilation/groups to Conv2D. Adding dilation to Pool2D by @coreylowman in #767
- [Breaking] Use
gemm
for matmul. Removes support for matrixmultiply & MKL by @coreylowman in #776 - [Breaking] Moving storage GAT to trait level generic. Split DeviceStorage into multiple traits by @coreylowman in #782
- [Breaking] Adding dilation/groups to ConvTranspose2D by @coreylowman in #783
What's Changed
- Adding f16 as Dtype by @coreylowman in #696
- Adding example by @sirandreww in #740
- Adds TryConcatAlong to support Concat along any axis by @coreylowman in #750
- Changed CUDA_ARCH in compatibility.cuh by @jafioti in #752
- Allow
broadcast_like
to accept tensors OR shapes by @VasanthakumarV in #751 - Removing rerun build.rs for output destination by @coreylowman in #754
- Fixing compatibility for compute cap 70-75 by @coreylowman in #757
- Adds TriangleTensor and CmpKernel traits to Device bound by @coreylowman in #760
- Using Bernoulli distribution in dropout - makes dropout reproducible across dtypes by @coreylowman in #761
- Fixes bug with f16 mean where number of elements reduced was f16::INF by @coreylowman in #763
- Placeholder f16 gemm speedups by @coreylowman in #765
- MultiHeadAttention 3d impl now broadcasts to 4d instead of duplicating logic by @coreylowman in #768
- Moving
cudarc?/f16
behindf16
feature by @coreylowman in #774 - impl Clone for Adam, SGD, RMSprop by @coreylowman in #775
- Properly setting read_dst for gemm in forward/backward pass by @coreylowman in #777
- Adds rayon dependency. Using
gemm::Parallelism::Rayon(rayon::current_num_threads())
by @coreylowman in #778 - Add LogSoftmax by @kurnevsky in #769
- Moving some tests off nightly. Adding docs to conv2d op by @coreylowman in #779
- Adding better error messages if nvidia-smi/nvcc are not found by @coreylowman in #784
- Using for loop with gridDim.x * blockDim.x as increment by @coreylowman in #787
- Removing __hmax and __hmin compat functions by @coreylowman in #788
- Uses grid striding in fill_with by @coreylowman in #790
- Exposed NumpyDType publicly by @jafioti in #791
- Fixing weight shape for grouped Conv2D by @coreylowman in #797
- Bump half/cudarc versions by @coreylowman in #805
- Using Groups in conv weight init by @coreylowman in #806
- Add scalar support to TensorCollection by @nkoppel in #799
New Contributors
- @sirandreww made their first contribution in #740
- @kurnevsky made their first contribution in #769
Full Changelog: v0.11.2...v0.12.0