fix TGI docs with new example (#678)

basetenlabs · Sep 22, 2023 · 40be6bd · 40be6bd
1 parent 59fda14
commit 40be6bd
Showing 1 changed file with 12 additions and 12 deletions.
diff --git a/docs/examples/performance/tgi-server.mdx b/docs/examples/performance/tgi-server.mdx
@@ -20,43 +20,43 @@ This example will cover:
 Get started by creating a new Truss:
 
 ```sh
-truss init --backend TGI opt125
+truss init --backend TGI falcon-7b
 ```
 
 You're going to see a couple of prompts. Follow along with the instructions below:
-1. Type `facebook/opt-125M` when prompted for `model`.
+1. Type `tiiuae/falcon-7b` when prompted for `model`.
 2. Press the `tab` key when prompted for `endpoint`. Select the `generate_stream` endpoint.
-3. Give your model a name like `OPT-125M`.
+3. Give your model a name like `Falcon 7B`.
 
 Finally, navigate to the directory:
 
 ```sh
-cd opt125
+cd falcon-7b
 ```
 
 ### Step 2: Setting resources and other arguments
 
 You'll notice that there's a `config.yaml` in the new directory. This is where we'll set the resources and other arguments for the model. Open the file in your favorite editor.
 
-OPT-125M will need a GPU so let's set the correct resources. Update the `resources` key with the following:
+Falcon 7B will need a GPU so let's set the correct resources. Update the `resources` key with the following:
 
 ```yaml config.yaml
 resources:
-  accelerator: T4
+  accelerator: A10G
   cpu: "4"
   memory: 16Gi
   use_gpu: true
 ```
 
-Also notice the `build` key which contains the `model_server` we're using as well as other arguments. These arguments are passed to the underlying vLLM server which you can find [here](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py).
+Also notice the `build` key which contains the `model_server` we're using as well as other arguments. These arguments are passed to the underlying TGI server.
 
 ### Step 3: Deploy the model
 
 <Note>
 You'll need a [Baseten API key](https://app.baseten.co/settings/account/api_keys) for this step.
 </Note>
 
-Let's deploy our OPT-125M vLLM model.
+Let's deploy our Falcon 7B TGI model.
 
 ```sh
 truss push
@@ -65,7 +65,7 @@ truss push
 You can invoke the model with:
 
 ```sh
-truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128, "sample": true}} --published'
+truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128, "sample": true}}' --published
 ```
 
 <RequestExample>
@@ -74,16 +74,16 @@ truss predict -d '{"inputs": "What is a large language model?", "parameters": {"
 build:
   arguments:
     endpoint: generate_stream
-    model: facebook/opt-125M
+    model: tiiuae/falcon-7b
   model_server: TGI
 environment_variables: {}
 external_package_dirs: []
 model_metadata: {}
-model_name: OPT-125M
+model_name: Falcon 7B
 python_version: py39
 requirements: []
 resources:
-  accelerator: T4
+  accelerator: A10G
   cpu: "4"
   memory: 16Gi
   use_gpu: true