Skip to content

Latest commit

 

History

History
67 lines (50 loc) · 2.54 KB

README.md

File metadata and controls

67 lines (50 loc) · 2.54 KB

CAMEL

Introduction

CAMEL(Context-Aware Modifier for Efficient Language model) is a speculative decoding method inspired by EAGLE. It compresses former input hidden states according to window size and then make speculations.

architecture

Installation

pip install modifier

Quick Start

CAMEL only supports meta-llama/Llama-2-7b-chat-hf currently.

import torch
from camel import CamelModel

prompt = "What is artificial intelligence?"
model = CamelModel.from_pretrained(
    base_model_path="meta-llama/Llama-2-7b-chat-hf",
    modifier_path="0xWe11es/camel-llama2-h1024-w1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = model.get_tokenizer()
input_ids = tokenizer(prompt).input_ids
output_ids = model.generate(input_ids)
output = tokenizer.decode(output_ids)
print(output)

CAMEL has the following modifier based on Llama2 (h stands for hidden size, w stands for window size):

Performance

We test modifier 0xWe11es/camel-llama2-h1024-w4 on several datasets, and get the following results compared to vanilla model (hf version).

Dataset Model Temperature Speed(Token/s) Speedup
MT-Bench LlaMa2 7B 0.0 71.85 1.92x
MT-Bench LlaMa2 7B 1.0 57.54 1.62x
GSM8K LlaMa2 7B 0.0 73.51 2.20x
GSM8K LlaMa2 7B 1.0 57.15 1.77x
Alpaca LlaMa2 7B 0.0 68.92 1.88x
Alpaca LlaMa2 7B 1.0 55.38 1.56x

Reference