Release v0.11.1 - cudnn, optimizations, and new ops/nn layers · coreylowman/dfdx

What's Changed

Fix bug in gather cuda kernel by @nkoppel in #588
feat(device): introduce AutoDevice type by @kakoc in #579
Use Recursive Macros to Implement Shape Operation Traits. by @nkoppel in #583
Add ToDtype tensor operation by @nkoppel in #582
Using 128 threads by default for cuda kernels by @coreylowman in #599
Add Slice tensor operation. by @nkoppel in #602
Optimizing conv kernels a bit by @coreylowman in #605
feat: add upper/lower triangles (tril and triu) allocations by @Alexandcoats in #568
Adds Tensor::roll by @coreylowman in #608
Using multiple streams for matmul with cuda by @coreylowman in #610
Fix no-std support by @Alexandcoats in #615
Adds matrixmultiply/std to std feature by @kstavro in #618
Implement concat for usize arrays; add concat to Device. by @nkoppel in #621
Allow conv2d and pool2d to use dynamic dimensions for width and height. by @nkoppel in #620
Switch to using nvcc --list-gpu-code for build.rs compute_cap by @quietlychris in #619
Fix bug in reshape on cuda by @nkoppel in #622
Don't always do try_min in pool_global.rs by @nkoppel in #623
Revert "Switch to using nvcc --list-gpu-code for build.rs compute_cap… by @coreylowman in #624
Adds restrided in favor of get_unstrided_index -> get_strided_index by @coreylowman in #628
Combines multiple calls to get_strided_index into a single loop by @coreylowman in #629
Reducing number of buffers sent to cuda for some operations by @coreylowman in #611
Optimizing conv2d more by @coreylowman in #631
Add ability to include smaller last batch by @nkoppel in #632
Upscale2D and ConvTrans2d by @opfromthestart in #603
impl Dtype for all Unit types except bool by @coreylowman in #635
Allow convtrans2d to use dynamic dimensions by @nkoppel in #639
JIT compiling kernel for to_dtype & reshape by @coreylowman in #634
Optimize conv transpose kernels to do same thing as conv by @coreylowman in #641
Reworking crate level documentation by @coreylowman in #644
Adds synchronize to DeviceStorage by @coreylowman in #645
adding usize dtype to cuda_kernel by @zojeda in #648
Add PReLU and LeakyReLU by @opfromthestart in #586
Moving logsumexp normalization off of graph by @coreylowman in #652
Adding CmpKernels to Device, more documentation by @coreylowman in #653
Removing bounds checking from cpu conv kernel folding by @coreylowman in #650
Allow upscale2d to use dynamic dimensions by @nkoppel in #654
Adding integration test for resnet18 by @coreylowman in #655
Removing some un-necessary blanket impls by @coreylowman in #656
Fixes conv transpose stride bug, adds more docs to upscale2d by @coreylowman in #658
Some QOL fixes by @opfromthestart in #659
Optimizing softmax & log_softmax by @coreylowman in #660
Reuse f(x) for unary operations when possible. by @coreylowman in #661
Allocating gradients in backward op by @coreylowman in #663
Adds Tensor::recip (1 / x) by @coreylowman in #665
Reshape layer by @opfromthestart in #666
Re-using tensor storage when possible by @coreylowman in #664
Adds cudnn feature flag. Removes "test-cuda" feature flag. Using cuDNN for convolutions by @coreylowman in #651
Always attempting allocation reuse during inference by @coreylowman in #673
Clarify reshape behavior in docs by @coreylowman in #674
Have SplitInto keep tapes of each head seperate by @nkoppel in #676
Using arch option in nvrtc by @coreylowman in #675

New Contributors

@kakoc made their first contribution in #579
@quietlychris made their first contribution in #619
@opfromthestart made their first contribution in #603
@zojeda made their first contribution in #648

Full Changelog: v0.11.0...v0.11.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.11.1 - cudnn, optimizations, and new ops/nn layers

What's Changed

New Contributors

Contributors