Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp llama.cpp docs #1214

Merged
merged 11 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 80 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,79 @@ load_balancing_strategy: random

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the [HuggingChat app on hf.co/chat](https://huggingface.co/chat).

0. [No Setup Deploy](#no-setup-deploy)
1. [Setup](#setup)
2. [Launch](#launch)
3. [Web Search](#web-search)
4. [Text Embedding Models](#text-embedding-models)
5. [Extra parameters](#extra-parameters)
6. [Common issues](#common-issues)
7. [Deploying to a HF Space](#deploying-to-a-hf-space)
8. [Building](#building)
0. [Quickstart](#quickstart)
1. [No Setup Deploy](#no-setup-deploy)
2. [Setup](#setup)
3. [Launch](#launch)
4. [Web Search](#web-search)
5. [Text Embedding Models](#text-embedding-models)
6. [Extra parameters](#extra-parameters)
7. [Common issues](#common-issues)
8. [Deploying to a HF Space](#deploying-to-a-hf-space)
9. [Building](#building)

## Quickstart
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a link to llama.cpp server docs somewhere. (it's quite nice)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have link here to hf docs

You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
and this hf docs contain link to llama server readme
A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).


You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 1 (Start llama.cpp server):**

```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 2 (tell chat-ui to use local llama.cpp server):**

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 3 (make sure you have MongoDb running locally):**

```bash
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
```

Read more [here](#database).

**Step 4 (start chat-ui):**

```bash
git clone https://github.com/huggingface/chat-ui
cd chat-ui
npm install
npm run dev -- --open
```

Read more [here](#launch).

<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>

## No Setup Deploy

Expand Down Expand Up @@ -415,11 +479,14 @@ MODELS=`[{

chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.

If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:

1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
3. Add the following to your `.env.local`:
```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

```env
MODELS=`[
Expand Down
49 changes: 29 additions & 20 deletions docs/source/configuration/models/providers/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,41 @@

Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.

If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:
If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:

1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
3. Add the following to your `.env.local`:
```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local Zephyr",
"chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [
{
"url": "http://127.0.0.1:8080",
"type": "llamacpp"
}
]
}
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
</div>
66 changes: 66 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,69 @@ Open source chat interface with support for tools, web search, multimodal and ma
🐙 **Multimodal**: Accepts image file uploads on supported providers

👤 **OpenID**: Optionally setup OpenID for user authentication

## Quickstart Locally

You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 1 (Start llama.cpp server):**

```bash
# install llama.cpp
brew install llama.cpp
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
```

A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`. Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 2 (tell chat-ui to use local llama.cpp server):**

Add the following to your `.env.local`:

```ini
MODELS=`[
{
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
```

Read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).

**Step 3 (make sure you have MongoDb running locally):**

```bash
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
```

Read more [here](https://github.com/huggingface/chat-ui?tab=Readme-ov-file#database).

**Step 4 (start chat-ui):**

```bash
git clone https://github.com/huggingface/chat-ui
cd chat-ui
npm install
npm run dev -- --open
```

read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#launch).

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
</div>
Loading