v0.1.9
feat
- support awq
- mv attention mask when use FMHA
- support sparse&robert embedding, support calc similarity
refactor
- use asyncio.future to avoid resource exclusivity
- mv asyncio lock to asyncmodel
fix
- tmp fix filelock version
- moe model size
- add headers for image downloading
- update whl version
- cutlass interface
docs
- update pipeline usage