diff --git a/README.md b/README.md index 93bff936..ae87f00e 100644 --- a/README.md +++ b/README.md @@ -54,9 +54,9 @@ ### 已支持模型 - [TransNormerLLM](https://github.com/OpenNLPLab/TransnormerLLM.git) - - [ ] TransNormerLLM-7B-Chat FastApi 部署调用 @[ml67](https://github.com/mlw67) ddl=3月底 - - [ ] TransNormerLLM-7B-Chat langchain 接入 @[ml67](https://github.com/mlw67) ddl=3月底 - - [ ] TransNormerLLM-7B-Chat WebDemo 部署 @[ml67](https://github.com/mlw67) ddl=3月底 + - [X] TransNormerLLM-7B-Chat FastApi 部署调用 @[ml67](https://github.com/mlw67) ddl=3月底 + - [X] TransNormerLLM-7B-Chat langchain 接入 @[ml67](https://github.com/mlw67) ddl=3月底 + - [X] TransNormerLLM-7B-Chat WebDemo 部署 @[ml67](https://github.com/mlw67) ddl=3月底 - [ ] TransNormerLLM-7B-Chat Lora 微调 @[ml67](https://github.com/mlw67) ddl=3月底ß - [谷歌-Gemma](https://huggingface.co/google/gemma-7b-it) diff --git "a/TransNormer/01-TransNormer-7B FastApi \351\203\250\347\275\262\350\260\203\347\224\250.md" "b/TransNormer/01-TransNormer-7B FastApi \351\203\250\347\275\262\350\260\203\347\224\250.md" new file mode 100644 index 00000000..91bf4f34 --- /dev/null +++ "b/TransNormer/01-TransNormer-7B FastApi \351\203\250\347\275\262\350\260\203\347\224\250.md" @@ -0,0 +1,245 @@ +# TransNormerLLM-7B FastApi 部署调用 + +## 1. TransNormer 介绍 + +TransNormerLLM 是一个基于线性注意力的 LLM,在准确性和效率方面均优于传统的基于 softmax 注意力的模型。它是在包含多达1.4 万亿个令牌的高质量语料库上进行训练的,TransNormerLLM 从之前的线性注意力架构 TransNormer 演变而来,进行了高级修改,包括 LRPE 位置嵌入、闪电注意力加速、新的门控和标准化机制(将在下文进行简要的介绍)。TransNormerLLM 在多项广受认可的中文、英文以及多语言通用和特定领域基准测试中取得了与其规模相当的竞争性表现。TransNormer包括具有385M、1B和7B参数的基本版本。所有版本都完全开放给学术研究。 + +**TransNormerLLM的架构改进** + +下文将简单介绍 TransNormerLLM 的各个模块以及研究者提出的一些改进措施。 + +**改进一:位置编码** + +TransNormer 中的较低层使用了 DiagAttention 来避免 dilution 问题。但是,这会导致 token 之间缺乏全局互动能力。为了解决这个问题,研究者为 TransNormerLLM 使用了带指数衰减的 LRPE(线性化相对位置编码),从而可在较低层保留完整的注意力。研究者把这种方法称为 LRPE-d。 + +**改进二:门控机制** + +门控可以增强模型的性能并使训练过程平滑。研究者为 TransNormerLLM 使用了来自论文《Transformer quality in linear time》的 Flash 方法并在 token 混合中使用了门控式线性注意力(GLA)的结构。为了进一步提升模型速度,其还提出了 Simple GLU(SGLU),其去除了原始 GLU 结构的激活函数,因为门本身就能引入非线性。 + +**改进三:张量归一化** + +研究者使用了 TransNormer 中引入的 NormAttention。在 TransNormerLLM 中,他们使用一种新的简单归一化函数 SimpleRMSNorm(简写为 SRMSNorm)替换了 RMSNorm。 + +其整体结构如下: +![模型的整体架构](images/TransNormer-structure.png) +图 1:新提出模型的整体架构 + +好啦,在了解TransNormer的基础上,我们开始调用吧,这里我们开始部署TransNormerLLM-7B的模型吧。 + +## 2. 环境准备 + +### 2.1 进入配置环境 + +在 Autodl 平台中租赁一个 3090/4090 等 24G 显存的显卡机器,如下图所示镜像选择 PyTorch-->2.0.0-->3.8(ubuntu20.04)-->11.8(11.3 版本以上的都可以)。 +接下来打开刚刚租用服务器的 JupyterLab,并且打开其中的终端开始环境配置、模型下载和运行演示。 + +![开启机器配置选择](images/Machine-Config.png) + +打开启动页的终端(Terminal)界面: + +![Python终端](images/python-terminal.png) + +左击红色框的部分,进入Python的终端控制台,如下图所示: + +![Python终端](images/python-terminal2.png) + +### 2.2 pip 换源加速下载并安装依赖包 + +接下来安装运行TransNormerLLM-7B所需要的相关依赖库,这里我们有两种安装方式,不过在安装依赖库之前我们首先更新pip版本(防止版本过低),并切换pip的安装源(到国内源,这样可以安装更快,防止下载链接超时) + +在红框部分逐行输入如下「2.2」中命令: +```shell +# 升级pip +python -m pip install --upgrade pip +# 更换 pypi 源加速库的安装 +pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple +``` + +**方式一:** +依然在红框部分逐行输入如下「2.2」中命令: + +```shell +pip install fastapi==0.104.1 +pip install uvicorn==0.24.0.post1 +pip install requests==2.25.1 +pip install modelscope==1.11.0 +pip install transformers==4.37.0 +pip install streamlit==1.24.0 +pip install sentencepiece==0.1.99 +pip install accelerate==0.24.1 +pip install transformers_stream_generator==0.0.4 +pip install triton==2.0.0 +pip install einops +``` + +**方式二:** +将如下内容: +```shell +fastapi==0.104.1 +uvicorn==0.24.0.post1 +requests==2.25.1 +modelscope==1.11.0 +transformers==4.37.0 +streamlit==1.24.0 +sentencepiece==0.1.99 +accelerate==0.24.1 +transformers_stream_generator==0.0.4 +triton==2.0.0 +einops +``` +用 vim 写入一个 requirements.txt 文件,然后运行命令:pip install -r requirements.txt + + +## 3. 模型下载 + +使用 modelscope 中的 snapshot_download 函数下载模型,第一个参数为模型名称,参数 cache_dir 为模型的下载路径。 + +模型的介绍地址(魔塔社区): +https://www.modelscope.cn/models/OpenNLPLab/TransNormerLLM-7B/summary + +在 /root/autodl-tmp 路径下新建 model_download.py 文件并在其中输入以下内容,粘贴代码后请及时保存文件,如下图所示。并运行 `python /root/autodl-tmp/model_download.py` 执行下载,模型大小为 12GB,下载模型大概需要 6 分钟。 + +其在终端界面的流程如下: +```cmd +cd /root/autodl-tmp +vim model_download.py +``` +然后保存退出(:wq) + +model_download.py 文件中的内容: +```python +import torch +from modelscope import snapshot_download, AutoModel, AutoTokenizer +import os +model_dir = snapshot_download('OpenNLPLab/TransNormerLLM-7B', cache_dir='/root/autodl-tmp', revision='master') +``` + +> 大概解释一下,这里是在做什么:首先,我们加载了基础的torch环境,和 加载了modelscope函数库的 snapshot_download, AutoModel, AutoTokenizer 三个函数类,用 snapshot_download 函数下载模型。(虽然模型下载的方式有多种,但snapshot_download具有一定的优势) + + +## 代码准备 + +在 /root/autodl-tmp 路径下新建 api.py 文件并在其中输入以下内容,粘贴代码后请及时保存文件。下面的代码有很详细的注释,大家如有不理解的地方,欢迎提出 issue。 + +```python +from fastapi import FastAPI, Request +from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig +import uvicorn +import json +import datetime +import torch + +# 设置设备参数 +DEVICE = "cuda" # 使用CUDA +DEVICE_ID = "0" # CUDA设备ID,如果未设置则为空 +CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE # 组合CUDA设备信息 + +# 清理GPU内存函数 +def torch_gc(): + if torch.cuda.is_available(): # 检查是否可用CUDA + with torch.cuda.device(CUDA_DEVICE): # 指定CUDA设备 + torch.cuda.empty_cache() # 清空CUDA缓存 + torch.cuda.ipc_collect() # 收集CUDA内存碎片 + +# 创建FastAPI应用 +app = FastAPI() + +# 处理POST请求的端点 +@app.post("/") +async def create_item(request: Request): + global model, tokenizer # 声明全局变量以便在函数内部使用模型和分词器 + json_post_raw = await request.json() # 获取POST请求的JSON数据 + json_post = json.dumps(json_post_raw) # 将JSON数据转换为字符串 + json_post_list = json.loads(json_post) # 将字符串转换为Python对象 + prompt = json_post_list.get('prompt') # 获取请求中的提示 + + messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": prompt} + ] + + # 调用模型进行对话生成 + input_ids = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True) + model_inputs = tokenizer([input_ids], return_tensors="pt").to('cuda') + generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512) + generated_ids = [ + output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) + ] + response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] + now = datetime.datetime.now() # 获取当前时间 + time = now.strftime("%Y-%m-%d %H:%M:%S") # 格式化时间为字符串 + # 构建响应JSON + answer = { + "response": response, + "status": 200, + "time": time + } + # 构建日志信息 + log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(response) + '"' + print(log) # 打印日志 + torch_gc() # 执行GPU内存清理 + return answer # 返回响应 + +# 主函数入口 +if __name__ == '__main__': + # 加载预训练的分词器和模型 + model_name_or_path = '/root/autodl-tmp/OpenNLPLab/TransNormerLLM-7B' + tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, use_fast=False) + model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", trust_remote_code=True, torch_dtype=torch.bfloat16) + + # 启动FastAPI应用 + # 用6006端口可以将autodl的端口映射到本地,从而在本地使用api + uvicorn.run(app, host='0.0.0.0', port=6006, workers=1) # 在指定端口和主机上启动应用 +``` + +## Api 部署 + +在终端输入以下命令启动api服务: + +```shell +cd /root/autodl-tmp #如果在 /root/autodl-tmp 路径下则这句不需要 +python api.py +``` + +加载完毕后出现如下信息说明成功。 + +![Alt text](images/server-ok.png) + +默认部署在 6006 端口,通过 POST 方法进行调用,可以使用 curl 调用,如下所示: + +```shell +curl -X POST "http://127.0.0.1:6006" \ + -H 'Content-Type: application/json' \ + -d '{"prompt": "你好"}' +``` + +![model response](images/response.png) + +也可以使用 python 中的 requests 库进行调用,如下所示: + +这里我们可以启用Jupyter notebook进行交互 + +![start Jupyter notebook](images/start-jupyter.png) + + +```python +import requests +import json + +def get_completion(prompt): + headers = {'Content-Type': 'application/json'} + data = {"prompt": prompt} + response = requests.post(url='http://127.0.0.1:6006', headers=headers, data=json.dumps(data)) + return response.json()['response'] + +if __name__ == '__main__': + print(get_completion('你好')) +``` + +得到的返回值如下所示: + +```json +{"response":"你好!有什么我可以帮助你的吗?","status":200,"time":"2024-02-05 18:08:19"} +``` + +![Alt text](images/Jupyter-response.png) diff --git "a/TransNormer/02-TransNormer-7B \346\216\245\345\205\245langchain\346\220\255\345\273\272\347\237\245\350\257\206\345\272\223\345\212\251\346\211\213.md" "b/TransNormer/02-TransNormer-7B \346\216\245\345\205\245langchain\346\220\255\345\273\272\347\237\245\350\257\206\345\272\223\345\212\251\346\211\213.md" new file mode 100644 index 00000000..0fe68f38 --- /dev/null +++ "b/TransNormer/02-TransNormer-7B \346\216\245\345\205\245langchain\346\220\255\345\273\272\347\237\245\350\257\206\345\272\223\345\212\251\346\211\213.md" @@ -0,0 +1,101 @@ +# TransNormerLLM-7B 接入 LangChain 搭建知识库助手 + +## 环境准备 +在 autodl 平台中租赁一个 3090/4090 等 24G 显存的显卡机器,如下图所示镜像选择 PyTorch-->2.0.0-->3.8(ubuntu20.04)-->11.8 + +![机器配置选择](images/Machine-Config.png) +接下来打开刚刚租用服务器的 JupyterLab,并且打开其中的终端开始环境配置、模型下载和运行 demo。 + +pip 换源加速下载并安装依赖包 + +```shell +# 升级pip +python -m pip install --upgrade pip +# 更换 pypi 源加速库的安装 +pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple + +pip install modelscope==1.11.0 +pip install "transformers>=4.37.0" accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed +pip install -U huggingface_hub +pip install triton==2.0.0 +pip install einops +pip install langchain +``` + +## 模型下载 + +使用 modelscope 中的 snapshot_download 函数下载模型,第一个参数为模型名称,参数 cache_dir 为模型的下载路径。 + +在 /root/autodl-tmp 路径下新建 model_download.py 文件并在其中输入以下内容,粘贴代码后记得保存文件,如下图所示。并运行 `python /root/autodl-tmp/model_download.py` 执行下载,模型大小为 14 GB,下载模型大概需要 2 分钟。 + +```python + +import torch +from modelscope import snapshot_download, AutoModel, AutoTokenizer +import os +model_dir = snapshot_download('OpenNLPLab/TransNormerLLM-7B', cache_dir='/root/autodl-tmp', revision='master') +``` + + +## 代码准备 + +为便捷构建 LLM 应用,我们需要基于本地部署的 TransNormerLLM-7B,自定义一个 LLM 类,将 TransNormerLLM-7B 接入到 LangChain 框架中。完成自定义 LLM 类之后,可以以完全一致的方式调用 LangChain 的接口,而无需考虑底层模型调用的不一致。 + +基于本地部署的 TransNormerLLM-7B 自定义 LLM 类并不复杂,我们只需从 LangChain.llms.base.LLM 类继承一个子类,并重写构造函数与 _call 函数即可: + +```python +from langchain.llms.base import LLM +from typing import Any, List, Optional +from langchain.callbacks.manager import CallbackManagerForLLMRun +from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, LlamaTokenizerFast +import torch + +class TransNormer_LLM(LLM): + # 基于本地 TransNormer 自定义 LLM 类 + tokenizer: AutoTokenizer = None + model: AutoModelForCausalLM = None + + def __init__(self, mode_name_or_path :str): + + super().__init__() + print("正在从本地加载模型...") + self.tokenizer = AutoTokenizer.from_pretrained(mode_name_or_path, trust_remote_code=True, use_fast=False) + self.model = AutoModelForCausalLM.from_pretrained(mode_name_or_path, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto") + self.model.generation_config = GenerationConfig.from_pretrained(mode_name_or_path) + print("完成本地模型的加载") + + def _call(self, prompt : str, stop: Optional[List[str]] = None, + run_manager: Optional[CallbackManagerForLLMRun] = None, + **kwargs: Any): + + messages = [{"role": "user", "content": prompt }] + input_ids = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) + model_inputs = self.tokenizer([input_ids], return_tensors="pt").to('cuda') + generated_ids = self.model.generate(model_inputs.input_ids,max_new_tokens=512) + generated_ids = [ + output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) + ] + response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] + + return response + @property + def _llm_type(self) -> str: + return "TransNormer_LLM" +``` + +在上述类定义中,我们分别重写了构造函数和 _call 函数:对于构造函数,我们在对象实例化的一开始加载本地部署的 TransNormer 模型,从而避免每一次调用都需要重新加载模型带来的时间过长;_call 函数是 LLM 类的核心函数,LangChain 会调用该函数来调用 LLM,在该函数中,我们调用已实例化模型的 chat 方法,从而实现对模型的调用并返回调用结果。 + +在整体项目中,我们将上述代码封装为 LLM.py,后续将直接从该文件中引入自定义的 LLM 类。 + + +## 调用 + +然后就可以像使用任何其他的langchain大模型功能一样使用了。 + +```python +from LLM import TransNormer_LLM #!注意此代码需要和 LLM.py在同路径下,如果是写在Jupyter 中则不需要库导入 +llm = TransNormer_LLM(mode_name_or_path = "/root/autodl-tmp/OpenNLPLab/TransNormerLLM-7B") +llm("你是谁") +``` + +![模型返回回答效果](images/question_to_the_TransNormer.png) diff --git a/TransNormer/03-TransNormer-7B WebDemo.md b/TransNormer/03-TransNormer-7B WebDemo.md new file mode 100644 index 00000000..5d93225f --- /dev/null +++ b/TransNormer/03-TransNormer-7B WebDemo.md @@ -0,0 +1,122 @@ +# TransNormerLLM-7B WebDemo 部署 + +## 环境准备 +在autodl平台中租一个3090/4090等24G显存的显卡机器,如下图所示镜像选择PyTorch-->2.0.0-->3.8(ubuntu20.04)-->11.8(11.3版本以上的都可以) +接下来打开刚刚租用服务器的JupyterLab, 图像 并且打开其中的终端开始环境配置、模型下载和运行演示。 +![机器配置选择](images/Machine-Config.png) + +pip换源和安装依赖包 +``` +# 升级pip +python -m pip install --upgrade pip +# 更换 pypi 源加速库的安装 +pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple + +pip install modelscope==1.11.0 +pip install "transformers>=4.37.0" +pip install streamlit==1.24.0 +pip install sentencepiece==0.1.99 +pip install accelerate==0.24.1 +pip install transformers_stream_generator==0.0.4 +pip install triton==2.0.0 +pip install einops +``` +## 模型下载 +使用 modelscope 中的 snapshot_download 函数下载模型,第一个参数为模型名称,参数 cache_dir 为模型的下载路径。 + +模型的介绍地址(魔塔社区): +https://www.modelscope.cn/models/OpenNLPLab/TransNormerLLM-7B/summary + +在 /root/autodl-tmp 路径下新建 model_download.py 文件并在其中输入以下内容,粘贴代码后请及时保存文件,如下图所示。并运行 `python /root/autodl-tmp/model_download.py` 执行下载,模型大小为 12GB,下载模型大概需要 6 分钟。 + +其在终端界面的流程如下: +```cmd +cd /root/autodl-tmp +vim model_download.py +``` +然后保存退出(:wq) + +model_download.py 文件中的内容: +```python +import torch +from modelscope import snapshot_download, AutoModel, AutoTokenizer +import os +model_dir = snapshot_download('OpenNLPLab/TransNormerLLM-7B', cache_dir='/root/autodl-tmp', revision='master') +``` + +## 代码准备 + +在`/root/autodl-tmp`路径下新建 `chatBot.py` 文件并在其中输入以下内容,粘贴代码后记得保存文件。下面的代码有很详细的注释,大家如有不理解的地方,欢迎提出issue。 + +```python +# 导入所需的库 +from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig +import torch +import streamlit as st + +# 在侧边栏中创建一个标题和一个链接 +with st.sidebar: + st.markdown("## TransNormer LLM") + "[开源大模型食用指南 self-llm](https://github.com/datawhalechina/self-llm.git)" + # 创建一个滑块,用于选择最大长度,范围在0到1024之间,默认值为512 + max_length = st.slider("max_length", 0, 1024, 512, step=1) + +# 创建一个标题和一个副标题 +st.title("💬 TransNormer Chatbot") +st.caption("🚀 A streamlit chatbot powered by Self-LLM") + +# 定义模型路径 +mode_name_or_path = '/root/autodl-tmp/OpenNLPLab/TransNormerLLM-7B' + +# 定义一个函数,用于获取模型和tokenizer +@st.cache_resource +def get_model(): + # 从预训练的模型中获取tokenizer + tokenizer = AutoTokenizer.from_pretrained(mode_name_or_path, trust_remote_code=True, use_fast=False) + # 从预训练的模型中获取模型,并设置模型参数 + model = AutoModelForCausalLM.from_pretrained(mode_name_or_path, torch_dtype=torch.bfloat16, trust_remote_code=True, + device_map="auto") + + return tokenizer, model + +# 加载TransNormer-4B-Chat的model和tokenizer +tokenizer, model = get_model() + +# 如果session_state中没有"messages",则创建一个包含默认消息的列表 +if "messages" not in st.session_state: + st.session_state["messages"] = [{"role": "assistant", "content": "有什么可以帮您的?"}] + +# 遍历session_state中的所有消息,并显示在聊天界面上 +for msg in st.session_state.messages: + st.chat_message(msg["role"]).write(msg["content"]) + +# 如果用户在聊天输入框中输入了内容,则执行以下操作 +if prompt := st.chat_input(): + # 将用户的输入添加到session_state中的messages列表中 + st.session_state.messages.append({"role": "user", "content": prompt}) + # 在聊天界面上显示用户的输入 + st.chat_message("user").write(prompt) + + # 构建输入 + input_ids = tokenizer.apply_chat_template(st.session_state.messages,tokenize=False,add_generation_prompt=True) + model_inputs = tokenizer([input_ids], return_tensors="pt").to('cuda') + generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512) + generated_ids = [ + output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) + ] + response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] + # 将模型的输出添加到session_state中的messages列表中 + st.session_state.messages.append({"role": "assistant", "content": response}) + # 在聊天界面上显示模型的输出 + st.chat_message("assistant").write(response) + # print(st.session_state) +``` + + +## 运行 demo + +在终端中运行以下命令,启动streamlit服务,并按照 `autodl` 的指示将端口映射到本地,然后在浏览器中打开链接 http://localhost:6006/ ,即可看到聊天界面。 + +```bash +streamlit run /root/autodl-tmp/chatBot.py --server.address 127.0.0.1 --server.port 6006 +``` diff --git "a/TransNormer/04-TrasnNormer-7B Lora \345\276\256\350\260\203.md" "b/TransNormer/04-TrasnNormer-7B Lora \345\276\256\350\260\203.md" new file mode 100644 index 00000000..017071c4 --- /dev/null +++ "b/TransNormer/04-TrasnNormer-7B Lora \345\276\256\350\260\203.md" @@ -0,0 +1,210 @@ +# TransNormerLLM-1B Lora 微调 & 全量微调 + +本节我们简要介绍如何基于 transformers、peft 等框架,对 TransNormerLLM-1B「备注:TransNormerLLM-358M/1B/7B的」 模型进行 Lora 微调 & 全量微调。Lora 是一种高效微调方法,深入了解其原理可参见博客:[知乎|深入浅出Lora](https://zhuanlan.zhihu.com/p/650197598)。 + +这个教程会在同目录下给大家提供一个 [nodebook](./TransNormerLLM-7B-Lora.ipynb) 文件,来让大家更好的学习。 + +## 环境配置 + +在完成基本环境配置和本地模型部署的情况下,你还需要安装一些第三方库,可以使用以下命令: + +```bash +python -m pip install --upgrade pip +# 更换 pypi 源加速库的安装 +pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple + +pip install modelscope==1.9.5 +pip install "transformers>=4.37.0" +pip install streamlit==1.24.0 +pip install sentencepiece==0.1.99 +pip install accelerate==0.24.1 +pip install transformers_stream_generator==0.0.4 +pip install datasets==2.18.0 +pip install peft==0.10.0 +pip install deepspeed +pip install triton==2.0.0 +pip install einops + +MAX_JOBS=8 pip install flash-attn --no-build-isolation +``` + +> 注意:flash-attn 安装会比较慢,大概需要十几分钟。 + +在本节教程里,我们将微调数据集 `huanhuan.json` 放置在根目录 [/dataset](../dataset/huanhuan.json),该样本数据取自 [huanhuan.json](https://github.com/datawhalechina/self-llm/blob/master/dataset/huanhuan.json) + +## 指令集构建 + +LLM 的微调一般指指令微调过程。所谓指令微调,是说我们使用的微调数据形如: + +```json +{ + "instruction":"回答以下用户问题,仅输出答案。", + "input":"1+1等于几?", + "output":"2" +} +``` + +其中,`instruction` 是用户指令,告知模型其需要完成的任务;`input` 是用户输入,是完成用户指令所必须的输入内容;`output` 是模型应该给出的输出。 + +即我们的核心训练目标是让模型具有理解并遵循用户指令的能力。因此,在指令集构建时,我们应针对我们的目标任务,针对性构建任务指令集。例如,在本节我们使用由项目合作者合作开源的 [Chat-甄嬛](https://github.com/KMnO4-zx/huanhuan-chat) 项目作为示例,我们的目标是构建一个能够模拟甄嬛对话风格的个性化 LLM,因此我们构造的指令形如: + + +```json +{ + "instruction": "你是谁?", + "input":"", + "output":"家父是大理寺少卿甄远道。" +} +``` + +当然,利用训练数据:`alpaca_data.json` 也可以的。该样本数据取自 [alpaca_data.json](https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json),数据由 52,002 个条目组成,已重新格式化。其主要目的是演示如何对我们的模型进行 SFT,并不保证其有效性。 +我们所构造的全部指令数据集在根目录下。 + +## 数据格式化 + +`Lora` 训练的数据是需要经过格式化、编码之后再输入给模型进行训练的,如果是熟悉 `Pytorch` 模型训练流程的同学会知道,我们一般需要将输入文本编码为 input_ids,将输出文本编码为 `labels`,编码之后的结果都是多维的向量。我们首先定义一个预处理函数,这个函数用于对每一个样本,编码其输入、输出文本并返回一个编码后的字典: + +```python +def process_func(example): + MAX_LENGTH = 384 # Llama分词器会将一个中文字切分为多个token,因此需要放开一些最大长度,保证数据的完整性 + input_ids, attention_mask, labels = [], [], [] + instruction = tokenizer(f"<|im_start|>system\n现在你要扮演皇帝身边的女人--甄嬛<|im_end|>\n<|im_start|>user\n{example['instruction'] + example['input']}<|im_end|>\n<|im_start|>assistant\n", add_special_tokens=False) # add_special_tokens 不在开头加 special_tokens + response = tokenizer(f"{example['output']}", add_special_tokens=False) + input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id] + attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1] # 因为eos token咱们也是要关注的所以 补充为1 + labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id] + if len(input_ids) > MAX_LENGTH: # 做一个截断 + input_ids = input_ids[:MAX_LENGTH] + attention_mask = attention_mask[:MAX_LENGTH] + labels = labels[:MAX_LENGTH] + return { + "input_ids": input_ids, + "attention_mask": attention_mask, + "labels": labels + } +``` + +`TransNormerLLM-7B` 采用的`Prompt Template`格式如下: + +```text +<|im_start|>system +You are a helpful assistant.<|im_end|> +<|im_start|>user +你是谁?<|im_end|> +<|im_start|>assistant +我是一个有用的助手。<|im_end|> +``` + +## 加载tokenizer和半精度模型 + +模型以半精度形式加载,如果你的显卡比较新的话,可以用`torch.bfolat`形式加载。对于自定义的模型一定要指定`trust_remote_code`参数为`True`。 + +```python +tokenizer = AutoTokenizer.from_pretrained('./OpenNLPLab/TransNormerLLM-7B/', use_fast=False, trust_remote_code=True) + +model = AutoModelForCausalLM.from_pretrained('./OpenNLPLab/TransNormerLLM-7B/', device_map="auto",torch_dtype=torch.bfloat16) +``` + +## 定义LoraConfig + +`LoraConfig`这个类中可以设置很多参数,但主要的参数没多少,简单讲一讲,感兴趣的同学可以直接看源码。 + +- `task_type`:模型类型 +- `target_modules`:需要训练的模型层的名字,主要就是`attention`部分的层,不同的模型对应的层的名字不同,可以传入数组,也可以字符串,也可以正则表达式。 +- `r`:`lora`的秩,具体可以看`Lora`原理 +- `lora_alpha`:`Lora alaph`,具体作用参见 `Lora` 原理 + +`Lora`的缩放是啥嘞?当然不是`r`(秩),这个缩放就是`lora_alpha/r`, 在这个`LoraConfig`中缩放就是4倍。 + +```python +config = LoraConfig( + task_type=TaskType.CAUSAL_LM, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + inference_mode=False, # 训练模式 + r=8, # Lora 秩 + lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理 + lora_dropout=0.1# Dropout 比例 +) +``` + +## 自定义 TrainingArguments 参数 + +`TrainingArguments`这个类的源码也介绍了每个参数的具体作用,当然大家可以来自行探索,这里就简单说几个常用的。 + +- `output_dir`:模型的输出路径 +- `per_device_train_batch_size`:顾名思义 `batch_size` +- `gradient_accumulation_steps`: 梯度累加,如果你的显存比较小,那可以把 `batch_size` 设置小一点,梯度累加增大一些。 +- `logging_steps`:多少步,输出一次`log` +- `num_train_epochs`:顾名思义 `epoch` +- `gradient_checkpointing`:梯度检查,这个一旦开启,模型就必须执行`model.enable_input_require_grads()`,这个原理大家可以自行探索,这里就不细说了。 + +```python +args = TrainingArguments( + output_dir="./output/DeepSeek", + per_device_train_batch_size=4, + gradient_accumulation_steps=4, + logging_steps=10, + num_train_epochs=3, + save_steps=100, + learning_rate=1e-4, + save_on_each_node=True, + gradient_checkpointing=True +) +``` + +## 使用 Trainer 训练 + +```python +trainer = Trainer( + model=model, + args=args, + train_dataset=tokenized_id, + data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True), +) +trainer.train() +``` + +## 加载 lora 权重推理 + +训练好了之后可以使用如下方式加载`lora`权重进行推理: + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +import torch +from peft import PeftModel + +mode_path = './OpenNLPLab/TransNormerLLM-7B/' +lora_path = 'lora_path' + +# 加载tokenizer +tokenizer = AutoTokenizer.from_pretrained(mode_path) + +# 加载模型 +model = AutoModelForCausalLM.from_pretrained(mode_path, device_map="auto",torch_dtype=torch.bfloat16) + +# 加载lora权重 +model = PeftModel.from_pretrained(model, model_id=lora_path, config=config) + +prompt = "你是谁?" +messages = [ + {"role": "system", "content": "现在你要扮演皇帝身边的女人--甄嬛"}, + {"role": "user", "content": prompt} +] + +text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) + +model_inputs = tokenizer([text], return_tensors="pt").to('cuda') + +generated_ids = model.generate( + model_inputs.input_ids, + max_new_tokens=512 +) +generated_ids = [ + output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) +] + +response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] + +print(response) +``` + diff --git a/TransNormer/TransNormenr-7B Lora.ipynb b/TransNormer/TransNormenr-7B Lora.ipynb new file mode 100644 index 00000000..4b1c3f2c --- /dev/null +++ b/TransNormer/TransNormenr-7B Lora.ipynb @@ -0,0 +1,607 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "de53995b-32ed-4722-8cac-ba104c8efacb", + "metadata": {}, + "source": [ + "# 导入环境" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "52fac949-4150-4091-b0c3-2968ab5e385c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from datasets import Dataset\n", + "import pandas as pd\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer, GenerationConfig" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "e098d9eb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# 将JSON文件转换为CSV文件\n", + "df = pd.read_json('./huanhuan.json')\n", + "ds = Dataset.from_pandas(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "8ac92d42-efae-49b1-a00e-ccaa75b98938", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'instruction': ['小姐,别的秀女都在求中选,唯有咱们小姐想被撂牌子,菩萨一定记得真真儿的——',\n", + " '这个温太医啊,也是古怪,谁不知太医不得皇命不能为皇族以外的人请脉诊病,他倒好,十天半月便往咱们府里跑。',\n", + " '嬛妹妹,刚刚我去府上请脉,听甄伯母说你来这里进香了。'],\n", + " 'input': ['', '', ''],\n", + " 'output': ['嘘——都说许愿说破是不灵的。', '你们俩话太多了,我该和温太医要一剂药,好好治治你们。', '出来走走,也是散心。']}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds[:3]" + ] + }, + { + "cell_type": "markdown", + "id": "51d05e5d-d14e-4f03-92be-9a9677d41918", + "metadata": {}, + "source": [ + "# 处理数据集" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "74ee5a67-2e55-4974-b90e-cbf492de500a", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" + ] + }, + { + "data": { + "text/plain": [ + "Qwen2Tokenizer(name_or_path='./qwen/Qwen1.5-7B-Chat/', vocab_size=151643, model_max_length=32768, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={\n", + "\t151643: AddedToken(\"<|endoftext|>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),\n", + "\t151644: AddedToken(\"<|im_start|>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),\n", + "\t151645: AddedToken(\"<|im_end|>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),\n", + "}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer = AutoTokenizer.from_pretrained('./OpenNLPLab/TransNormerLLM-7B/', use_fast=False, trust_remote_code=True)\n", + "tokenizer" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2503a5fa-9621-4495-9035-8e7ef6525691", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def process_func(example):\n", + " MAX_LENGTH = 384 # Llama分词器会将一个中文字切分为多个token,因此需要放开一些最大长度,保证数据的完整性\n", + " input_ids, attention_mask, labels = [], [], []\n", + " instruction = tokenizer(f\"<|im_start|>system\\n现在你要扮演皇帝身边的女人--甄嬛<|im_end|>\\n<|im_start|>user\\n{example['instruction'] + example['input']}<|im_end|>\\n<|im_start|>assistant\\n\", add_special_tokens=False) # add_special_tokens 不在开头加 special_tokens\n", + " response = tokenizer(f\"{example['output']}\", add_special_tokens=False)\n", + " input_ids = instruction[\"input_ids\"] + response[\"input_ids\"] + [tokenizer.pad_token_id]\n", + " attention_mask = instruction[\"attention_mask\"] + response[\"attention_mask\"] + [1] # 因为eos token咱们也是要关注的所以 补充为1\n", + " labels = [-100] * len(instruction[\"input_ids\"]) + response[\"input_ids\"] + [tokenizer.pad_token_id] \n", + " if len(input_ids) > MAX_LENGTH: # 做一个截断\n", + " input_ids = input_ids[:MAX_LENGTH]\n", + " attention_mask = attention_mask[:MAX_LENGTH]\n", + " labels = labels[:MAX_LENGTH]\n", + " return {\n", + " \"input_ids\": input_ids,\n", + " \"attention_mask\": attention_mask,\n", + " \"labels\": labels\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "84f870d6-73a9-4b0f-8abf-687b32224ad8", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Map: 0%| | 0/3729 [00:00system\\n现在你要扮演皇帝身边的女人--甄嬛<|im_end|>\\n<|im_start|>user\\n小姐,别的秀女都在求中选,唯有咱们小姐想被撂牌子,菩萨一定记得真真儿的——<|im_end|>\\n<|im_start|>assistant\\n嘘——都说许愿说破是不灵的。<|endoftext|>'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode(tokenized_id[0]['input_ids'])" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "97f16f66-324a-454f-8cc3-ef23b100ecff", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'你们俩话太多了,我该和温太医要一剂药,好好治治你们。<|endoftext|>'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.decode(list(filter(lambda x: x != -100, tokenized_id[1][\"labels\"])))" + ] + }, + { + "cell_type": "markdown", + "id": "424823a8-ed0d-4309-83c8-3f6b1cdf274c", + "metadata": {}, + "source": [ + "# 创建模型" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "170764e5-d899-4ef4-8c53-36f6dec0d198", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9f55b92663ec44c0b3dd6b46ce6c397b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/4 [00:00, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=, inference_mode=False, r=8, target_modules={'gate_proj', 'q_proj', 'k_proj', 'o_proj', 'down_proj', 'v_proj', 'up_proj'}, lora_alpha=32, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={})" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from peft import LoraConfig, TaskType, get_peft_model\n", + "\n", + "config = LoraConfig(\n", + " task_type=TaskType.CAUSAL_LM, \n", + " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n", + " inference_mode=False, # 训练模式\n", + " r=8, # Lora 秩\n", + " lora_alpha=32, # Lora alaph,具体作用参见 Lora 原理\n", + " lora_dropout=0.1# Dropout 比例\n", + ")\n", + "config" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "2c2489c5-eaab-4e1f-b06a-c3f914b4bf8e", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "LoraConfig(peft_type=, auto_mapping=None, base_model_name_or_path='./qwen/Qwen1.5-7B-Chat/', revision=None, task_type=, inference_mode=False, r=8, target_modules={'gate_proj', 'q_proj', 'k_proj', 'o_proj', 'down_proj', 'v_proj', 'up_proj'}, lora_alpha=32, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={})" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = get_peft_model(model, config)\n", + "config" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "ebf5482b-fab9-4eb3-ad88-c116def4be12", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "trainable params: 19,988,480 || all params: 7,741,313,024 || trainable%: 0.2582052933143348\n" + ] + } + ], + "source": [ + "model.print_trainable_parameters()" + ] + }, + { + "cell_type": "markdown", + "id": "ca055683-837f-4865-9c57-9164ba60c00f", + "metadata": {}, + "source": [ + "# 配置训练参数" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7e76bbff-15fd-4995-a61d-8364dc5e9ea0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "args = TrainingArguments(\n", + " output_dir=\"./output/TransNormerLLM-7B-Lora\",\n", + " per_device_train_batch_size=4,\n", + " gradient_accumulation_steps=4,\n", + " logging_steps=10,\n", + " num_train_epochs=3,\n", + " save_steps=100,\n", + " learning_rate=1e-4,\n", + " save_on_each_node=True,\n", + " gradient_checkpointing=True\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "f142cb9c-ad99-48e6-ba86-6df198f9ed96", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.\n" + ] + } + ], + "source": [ + "trainer = Trainer(\n", + " model=model,\n", + " args=args,\n", + " train_dataset=tokenized_id,\n", + " data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aec9bc36-b297-45af-99e1-d4c4d82be081", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [134/699 03:51 < 16:28, 0.57 it/s, Epoch 0.57/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
104.153500
203.031600
303.079400
402.896600
503.032000
602.925300
702.937100
802.948500
902.994200
1002.895800
1102.841600
1202.911100
1302.935700

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Checkpoint destination directory ./output/Qwen1.5/checkpoint-100 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n", + "/root/miniconda3/lib/python3.8/site-packages/peft/utils/save_and_load.py:148: UserWarning: Could not find a config file in ./qwen/Qwen1.5-7B-Chat/ - will assume that the vocabulary was not modified.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "trainer.train()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.10.9 ('koopman_rl')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + }, + "vscode": { + "interpreter": { + "hash": "4e3e83c64c02740d29893152d09e5006b6c7a30ceaa232b939b9d06de7d1432d" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/TransNormer/images/Jupyter-response.png b/TransNormer/images/Jupyter-response.png new file mode 100644 index 00000000..f77bc34c Binary files /dev/null and b/TransNormer/images/Jupyter-response.png differ diff --git a/TransNormer/images/Machine-Config.png b/TransNormer/images/Machine-Config.png new file mode 100644 index 00000000..db65d8e2 Binary files /dev/null and b/TransNormer/images/Machine-Config.png differ diff --git a/TransNormer/images/TransNormer-structure.png b/TransNormer/images/TransNormer-structure.png new file mode 100644 index 00000000..b6be4d27 Binary files /dev/null and b/TransNormer/images/TransNormer-structure.png differ diff --git a/TransNormer/images/python-terminal.png b/TransNormer/images/python-terminal.png new file mode 100644 index 00000000..8db4c7b4 Binary files /dev/null and b/TransNormer/images/python-terminal.png differ diff --git a/TransNormer/images/python-terminal2.png b/TransNormer/images/python-terminal2.png new file mode 100644 index 00000000..c1add133 Binary files /dev/null and b/TransNormer/images/python-terminal2.png differ diff --git a/TransNormer/images/question_to_the_TransNormer.png b/TransNormer/images/question_to_the_TransNormer.png new file mode 100644 index 00000000..dfb86669 Binary files /dev/null and b/TransNormer/images/question_to_the_TransNormer.png differ diff --git a/TransNormer/images/response.png b/TransNormer/images/response.png new file mode 100644 index 00000000..10103c72 Binary files /dev/null and b/TransNormer/images/response.png differ diff --git a/TransNormer/images/server-ok.png b/TransNormer/images/server-ok.png new file mode 100644 index 00000000..72c437e6 Binary files /dev/null and b/TransNormer/images/server-ok.png differ diff --git a/TransNormer/images/start-jupyter.png b/TransNormer/images/start-jupyter.png new file mode 100644 index 00000000..10120c48 Binary files /dev/null and b/TransNormer/images/start-jupyter.png differ