GitHub - 01-ai/Yi-1.5: Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.

🤗 HuggingFace • 🤖 ModelScope • 🟣 wisemodel
👾 Discord • 🐤 Twitter • 💬 WeChat
📝 Paper • 💪 Tech Blog • 🙌 FAQ • 📗 Learning Hub

Intro
News
Quick Start
Web Demo
Deployment
Fine-tuning
API
License

Intro

Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension.

Yi-1.5 comes in 3 model sizes: 34B, 9B, and 6B. For model details and benchmarks, see Model Card.

News

2024-05-13: The Yi-1.5 series models are open-sourced, further improving coding, math, reasoning, and instruction-following abilities.

Requirements

Make sure Python 3.10 or a later version is installed.
Set up the environment and install the required packages.
```
pip install -r requirements.txt
```
Download the Yi-1.5 model from Hugging Face, ModelScope, or WiseModel.

Quick Start

This tutorial runs Yi-1.5-34B-Chat locally on an A800 (80G).

💡 Tip: If you want to get started with the Yi model and explore different methods for inference, check out the Yi Cookbook.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = '<your-model-path>'

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)

# Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "hi"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'), eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)

Ollama

You can run Yi-1.5 models on Ollama locally.

After installing Ollama, you can start the Ollama service. Note that keep this service running while you use Ollama.
```
ollama serve
```
Run Yi-1.5 models. For more Yi models supported by Ollama, see Yi tags.
```
ollama run yi:v1.5
```

Chat with Yi-1.5 via OpenAI-compatible API. For more details on how to use Yi-1.5 via OpenAI API and REST API on Ollama, see Ollama docs.

from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)
chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'What is your name',
        }
    ],
    model='yi:1.5',
)

Deployment

Prerequisites: Before deploying Yi-1.5 models, make sure you meet the software and hardware requirements.

vLLM

Prerequisites: Download the latest version of vLLM.

Start the server with a chat model.

python -m vllm.entrypoints.openai.api_server  --model 01-ai/Yi-1.5-9B-Chat  --served-model-name Yi-1.5-9B-Chat

Use the chat API.

HTTP

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Yi-1.5-9B-Chat",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'

Python client

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="Yi-1.5-9B-Chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ]
)
print("Chat response:", chat_response)

Web Demo

You can activate Yi-1.5-34B-Chat through the huggingface chat ui then experience it.

Or you can build it locally by yourself, as follows:

python demo/web_demo.py -c <your-model-path>

Fine-tuning

You can use LLaMA-Factory, Swift, XTuner, and Firefly for fine-tuning. These frameworks all support fine-tuning the Yi series models.

API

Yi APIs are OpenAI-compatible and provided at Yi Platform. Sign up to get free tokens, and you can also pay-as-you-go at a competitive price. Additionally, Yi APIs are also deployed on Replicate and OpenRouter.

License

The code and weights of the Yi-1.5 series models are distributed under the Apache 2.0 license.

If you create derivative works based on this model, please include the following attribution in your derivative works:

This work is a derivative of [The Yi-1.5 Series Model You Base On] by 01.AI, used under the Apache 2.0 License.

[ Back to top ⬆️ ]

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
demo		demo
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

News

Requirements

Quick Start

Ollama

Deployment

vLLM

Web Demo

Fine-tuning

API

License

About

Releases

Packages

Contributors 6

License

01-ai/Yi-1.5

Folders and files

Latest commit

History

Repository files navigation

Intro

News

Requirements

Quick Start

Ollama

Deployment

vLLM

Web Demo

Fine-tuning

API

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages