Skip to content

CAMEL: Context-Aware Modifier for Efficient Language model

License

Notifications You must be signed in to change notification settings

CSWellesSun/CAMEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAMEL

Introduction

CAMEL(Context-Aware Modifier for Efficient Language model) is a speculative decoding method inspired by EAGLE. It compresses former input hidden states according to window size and then make speculations.

architecture

Installation

pip install modifier

Quick Start

CAMEL only supports meta-llama/Llama-2-7b-chat-hf currently.

import torch
from camel import CamelModel

prompt = "What is artificial intelligence?"
model = CamelModel.from_pretrained(
    base_model_path="meta-llama/Llama-2-7b-chat-hf",
    modifier_path="0xWe11es/camel-llama2-h1024-w1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = model.get_tokenizer()
input_ids = tokenizer(prompt).input_ids
output_ids = model.generate(input_ids)
output = tokenizer.decode(output_ids)
print(output)

CAMEL has the following modifier based on Llama2 (h stands for hidden size, w stands for window size):

Performance

We test modifier 0xWe11es/camel-llama2-h1024-w4 on several datasets, and get the following results compared to vanilla model (hf version).

Dataset Model Temperature Speed(Token/s) Speedup
MT-Bench LlaMa2 7B 0.0 71.85 1.92x
MT-Bench LlaMa2 7B 1.0 57.54 1.62x
GSM8K LlaMa2 7B 0.0 73.51 2.20x
GSM8K LlaMa2 7B 1.0 57.15 1.77x
Alpaca LlaMa2 7B 0.0 68.92 1.88x
Alpaca LlaMa2 7B 1.0 55.38 1.56x

Reference

About

CAMEL: Context-Aware Modifier for Efficient Language model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published