Change the repository type filter
All
Repositories list
12 repositories
ScaleLLM
PublicA high-performance inference system for large language models, designed for production environments.whl
Publicflashinfer
Publicvcpkg
PublicLLMBench
Publicdiscussions
Publicchatbot-ui
Publicflash-attention
Publictokenizers
Publicxformers
PublicFasterTransformer
PublicByteTransformer
Publicoptimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052