Skip to content

Commit

Permalink
fix TGI docs with new example (#678)
Browse files Browse the repository at this point in the history
  • Loading branch information
philipkiely-baseten authored Sep 22, 2023
1 parent 59fda14 commit 40be6bd
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions docs/examples/performance/tgi-server.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,43 +20,43 @@ This example will cover:
Get started by creating a new Truss:

```sh
truss init --backend TGI opt125
truss init --backend TGI falcon-7b
```

You're going to see a couple of prompts. Follow along with the instructions below:
1. Type `facebook/opt-125M` when prompted for `model`.
1. Type `tiiuae/falcon-7b` when prompted for `model`.
2. Press the `tab` key when prompted for `endpoint`. Select the `generate_stream` endpoint.
3. Give your model a name like `OPT-125M`.
3. Give your model a name like `Falcon 7B`.

Finally, navigate to the directory:

```sh
cd opt125
cd falcon-7b
```

### Step 2: Setting resources and other arguments

You'll notice that there's a `config.yaml` in the new directory. This is where we'll set the resources and other arguments for the model. Open the file in your favorite editor.

OPT-125M will need a GPU so let's set the correct resources. Update the `resources` key with the following:
Falcon 7B will need a GPU so let's set the correct resources. Update the `resources` key with the following:

```yaml config.yaml
resources:
accelerator: T4
accelerator: A10G
cpu: "4"
memory: 16Gi
use_gpu: true
```
Also notice the `build` key which contains the `model_server` we're using as well as other arguments. These arguments are passed to the underlying vLLM server which you can find [here](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py).
Also notice the `build` key which contains the `model_server` we're using as well as other arguments. These arguments are passed to the underlying TGI server.

### Step 3: Deploy the model

<Note>
You'll need a [Baseten API key](https://app.baseten.co/settings/account/api_keys) for this step.
</Note>

Let's deploy our OPT-125M vLLM model.
Let's deploy our Falcon 7B TGI model.

```sh
truss push
Expand All @@ -65,7 +65,7 @@ truss push
You can invoke the model with:

```sh
truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128, "sample": true}} --published'
truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128, "sample": true}}' --published
```

<RequestExample>
Expand All @@ -74,16 +74,16 @@ truss predict -d '{"inputs": "What is a large language model?", "parameters": {"
build:
arguments:
endpoint: generate_stream
model: facebook/opt-125M
model: tiiuae/falcon-7b
model_server: TGI
environment_variables: {}
external_package_dirs: []
model_metadata: {}
model_name: OPT-125M
model_name: Falcon 7B
python_version: py39
requirements: []
resources:
accelerator: T4
accelerator: A10G
cpu: "4"
memory: 16Gi
use_gpu: true
Expand Down

0 comments on commit 40be6bd

Please sign in to comment.