Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Updated
Sep 8, 2024 - Cuda
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Codes for DTC-SpMM (ASPLOS'24)
The lab assignments from CS4302 Parallel and Distributed Programming (2022 Fall) with my solutions
Add a description, image, and links to the tensor-core topic page so that developers can more easily learn about it.
To associate your repository with the tensor-core topic, visit your repo's landing page and select "manage topics."