📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
-
Updated
Nov 23, 2024
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
📒A small curated list of Awesome Diffusion Inference Papers with codes.
Add a description, image, and links to the open-sora topic page so that developers can more easily learn about it.
To associate your repository with the open-sora topic, visit your repo's landing page and select "manage topics."