v0.1.8

xinfeishi released this 25 Mar 13:32

· 1160 commits to main since this release

796698a

feat

support qwen2 gptq
update multi_task_prompt create
speculative support tp
support roberta

refactor

refactor multimodal model process

fix

fix kv cache int8 bug: add dequantization method in reuse block scenario
fix stream output stop words
fix lora

Assets 4