v0.1.8
feat
- support qwen2 gptq
- update multi_task_prompt create
- speculative support tp
- support roberta
refactor
- refactor multimodal model process
fix
- fix kv cache int8 bug: add dequantization method in reuse block scenario
- fix stream output stop words
- fix lora