v0.11.1 - cudnn, optimizations, and new ops/nn layers
What's Changed
- Fix bug in gather cuda kernel by @nkoppel in #588
- feat(device): introduce AutoDevice type by @kakoc in #579
- Use Recursive Macros to Implement Shape Operation Traits. by @nkoppel in #583
- Add ToDtype tensor operation by @nkoppel in #582
- Using 128 threads by default for cuda kernels by @coreylowman in #599
- Add Slice tensor operation. by @nkoppel in #602
- Optimizing conv kernels a bit by @coreylowman in #605
- feat: add upper/lower triangles (tril and triu) allocations by @Alexandcoats in #568
- Adds Tensor::roll by @coreylowman in #608
- Using multiple streams for matmul with cuda by @coreylowman in #610
- Fix no-std support by @Alexandcoats in #615
- Adds matrixmultiply/std to std feature by @kstavro in #618
- Implement concat for usize arrays; add concat to Device. by @nkoppel in #621
- Allow conv2d and pool2d to use dynamic dimensions for width and height. by @nkoppel in #620
- Switch to using nvcc --list-gpu-code for build.rs compute_cap by @quietlychris in #619
- Fix bug in reshape on cuda by @nkoppel in #622
- Don't always do try_min in pool_global.rs by @nkoppel in #623
- Revert "Switch to using nvcc --list-gpu-code for build.rs compute_cap… by @coreylowman in #624
- Adds
restrided
in favor ofget_unstrided_index
->get_strided_index
by @coreylowman in #628 - Combines multiple calls to get_strided_index into a single loop by @coreylowman in #629
- Reducing number of buffers sent to cuda for some operations by @coreylowman in #611
- Optimizing conv2d more by @coreylowman in #631
- Add ability to include smaller last batch by @nkoppel in #632
- Upscale2D and ConvTrans2d by @opfromthestart in #603
- impl Dtype for all Unit types except bool by @coreylowman in #635
- Allow convtrans2d to use dynamic dimensions by @nkoppel in #639
- JIT compiling kernel for to_dtype & reshape by @coreylowman in #634
- Optimize conv transpose kernels to do same thing as conv by @coreylowman in #641
- Reworking crate level documentation by @coreylowman in #644
- Adds synchronize to DeviceStorage by @coreylowman in #645
- adding usize dtype to cuda_kernel by @zojeda in #648
- Add PReLU and LeakyReLU by @opfromthestart in #586
- Moving logsumexp normalization off of graph by @coreylowman in #652
- Adding CmpKernels to Device, more documentation by @coreylowman in #653
- Removing bounds checking from cpu conv kernel folding by @coreylowman in #650
- Allow upscale2d to use dynamic dimensions by @nkoppel in #654
- Adding integration test for resnet18 by @coreylowman in #655
- Removing some un-necessary blanket impls by @coreylowman in #656
- Fixes conv transpose stride bug, adds more docs to upscale2d by @coreylowman in #658
- Some QOL fixes by @opfromthestart in #659
- Optimizing softmax & log_softmax by @coreylowman in #660
- Reuse f(x) for unary operations when possible. by @coreylowman in #661
- Allocating gradients in backward op by @coreylowman in #663
- Adds
Tensor::recip
(1 / x
) by @coreylowman in #665 - Reshape layer by @opfromthestart in #666
- Re-using tensor storage when possible by @coreylowman in #664
- Adds cudnn feature flag. Removes "test-cuda" feature flag. Using cuDNN for convolutions by @coreylowman in #651
- Always attempting allocation reuse during inference by @coreylowman in #673
- Clarify reshape behavior in docs by @coreylowman in #674
- Have SplitInto keep tapes of each head seperate by @nkoppel in #676
- Using arch option in nvrtc by @coreylowman in #675
New Contributors
- @kakoc made their first contribution in #579
- @quietlychris made their first contribution in #619
- @opfromthestart made their first contribution in #603
- @zojeda made their first contribution in #648
Full Changelog: v0.11.0...v0.11.1