A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Nov 26, 2024 - Python
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
A high-throughput and memory-efficient inference and serving engine for LLMs
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Open Voice OS Server Status Page
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
PyTorch native quantization and sparsity for training and inference
Pytorch domain library for recommendation systems
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world
Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator
cuGraph - RAPIDS Graph Analytics Library
cuML - RAPIDS Machine Learning Library
Bright Wire is an open source machine learning library for .NET with GPU support (via CUDA)
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...
CUDA Core Compute Libraries
The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.
Created by Nvidia
Released June 23, 2007