huggingface · mishig25 · May 29, 2024 · May 29, 2024 · May 29, 2024 · May 29, 2024
diff --git a/README.md b/README.md
@@ -30,6 +30,71 @@ A chat interface using open source models, eg OpenAssistant or Llama. It is a Sv
 7. [Deploying to a HF Space](#deploying-to-a-hf-space)
 8. [Building](#building)
 
+## Quickstart Locally
+
+You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 1 (Start llama.cpp server):**
+
+```bash
+# install llama.cpp
+brew install llama.cpp
+# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
+llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
+```
+
+A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`
+
+read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 2 (tell chat-ui to use local llama.cpp server):**
+
+Add the following to your `.env.local`:
+
+```ini
+MODELS=`[
+  {
+    "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
+    "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "preprompt": "",
+    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
+    "parameters": {
+      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
+      "temperature": 0.7,
+      "max_new_tokens": 1024,
+      "truncate": 3071
+    },
+    "endpoints": [{
+      "type" : "llamacpp",
+      "baseURL": "http://localhost:8080"
+    }],
+  },
+]`
+```
+
+read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 3 (make sure you have MongoDb running locally):**
+
+```bash
+docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
+```
+
+read more [here](#database).
+
+**Step 4 (start chat-ui):**
+
+```bash
+git clone https://github.com/huggingface/chat-ui
+cd chat-ui
+npm install
+npm run dev -- --open
+```
+
+read more [here](#launch).
+
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
+
 ## No Setup Deploy
 
 If you don't want to configure, setup, and launch your own Chat UI yourself, you can use this option as a fast deploy alternative.
@@ -415,11 +480,14 @@ MODELS=`[{
 
 chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
 
-If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
+If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
 
-1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
-2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
-3. Add the following to your `.env.local`:
+```bash
+# install llama.cpp
+brew install llama.cpp
+# start llama.cpp server
+llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
+```
 
 ```env
 MODELS=`[

diff --git a/docs/source/configuration/models/providers/llamacpp.md b/docs/source/configuration/models/providers/llamacpp.md
@@ -7,32 +7,41 @@
 
 Chat UI supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
 
-If you want to run Chat UI with llama.cpp, you can do the following, using Zephyr as an example model:
+If you want to run Chat UI with llama.cpp, you can do the following, using [microsoft/Phi-3-mini-4k-instruct-gguf](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) as an example model:
 
-1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
-2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
-3. Add the following to your `.env.local`:
+```bash
+# install llama.cpp
+brew install llama.cpp
+# start llama.cpp server
+llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
+```
+
+A local LLaMA.cpp HTTP Server will start on `http://localhost:8080` (to change the port or any other default options, please find [LLaMA.cpp HTTP Server readme](https://github.com/ggerganov/llama.cpp/tree/master/examples/server)).
+
+Add the following to your `.env.local`:
 
 ```ini
 MODELS=`[
   {
-    "name": "Local Zephyr",
-    "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
+    "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
+    "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "preprompt": "",
+    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
     "parameters": {
-      "temperature": 0.1,
-      "top_p": 0.95,
-      "repetition_penalty": 1.2,
-      "top_k": 50,
-      "truncate": 1000,
-      "max_new_tokens": 2048,
-      "stop": ["</s>"]
+      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
+      "temperature": 0.7,
+      "max_new_tokens": 1024,
+      "truncate": 3071
     },
-    "endpoints": [
-      {
-        "url": "http://127.0.0.1:8080",
-        "type": "llamacpp"
-      }
-    ]
-  }
+    "endpoints": [{
+      "type" : "llamacpp",
+      "baseURL": "http://localhost:8080"
+    }],
+  },
 ]`
 ```
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
+</div>
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -9,3 +9,71 @@ Open source chat interface with support for tools, web search, multimodal and ma
 🐙 **Multimodal**: Accepts image file uploads on supported providers
 
 👤 **OpenID**: Optionally setup OpenID for user authentication
+
+## Quickstart Locally
+
+You can quickly have a locally running chat-ui & LLM text-generation server thanks to chat-ui's [llama.cpp server support](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 1 (Start llama.cpp server):**
+
+```bash
+# install llama.cpp
+brew install llama.cpp
+# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
+llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
+```
+
+A local LLaMA.cpp HTTP Server will start on `http://localhost:8080`
+
+read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 2 (tell chat-ui to use local llama.cpp server):**
+
+Add the following to your `.env.local`:
+
+```ini
+MODELS=`[
+  {
+    "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
+    "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "preprompt": "",
+    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
+    "parameters": {
+      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
+      "temperature": 0.7,
+      "max_new_tokens": 1024,
+      "truncate": 3071
+    },
+    "endpoints": [{
+      "type" : "llamacpp",
+      "baseURL": "http://localhost:8080"
+    }],
+  },
+]`
+```
+
+read more [here](https://huggingface.co/docs/chat-ui/configuration/models/providers/llamacpp).
+
+**Step 3 (make sure you have MongoDb running locally):**
+
+```bash
+docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
+```
+
+read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#database).
+
+**Step 4 (start chat-ui):**
+
+```bash
+git clone https://github.com/huggingface/chat-ui
+cd chat-ui
+npm install
+npm run dev -- --open
+```
+
+read more [here](https://github.com/huggingface/chat-ui?tab=readme-ov-file#launch).
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-light.png" height="auto"/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/llamacpp-dark.png" height="auto"/>
+</div>