2.3.17 Satellite: lm evaluation harness

lm-evaluation-harness

Handle: lmeval URL: -

This project provides a unified framework to test generative language models on a large number of different evaluation tasks.

Starting

# [Optional] pre-build the image
harbor build lmeval

Note

You can use either lmeval or lm_eval alias for invoking the service. These docs will use lmeval.

# [Optional] see the help
harbor lmeval --help

# [Optional] see available tasks
harbor lmeval --tasks list

After the configuration is complete (see below), run evals with:

harbor lmeval --tasks gsm8k --limit 10

Configuration

Harbor's lmeval service is mainly intended to be run against OpenAI-compatible APIs, rather than running models on its own, although the latter is also possible.

In other words, the workflows are optimised for running lm_eval with the --model local-completions, which is also the default.

# local-completions is the default
# See other supported "types":
# https://github.com/EleutherAI/lm-evaluation-harness/blob/main/README.md#model-apis-and-inference-servers
harbor lmeval type

# Get/set the model and the API url
harbor lmeval model
harbor lmeval model meta-llama/Meta-Llama-3-8B-Instruct
harbor lmeval api
harbor lmeval api $(harbor url -i vllm)

Tip

harbor url works when a given service is running, -i is for retrieving the URL to be used within the service containers.

Above commands are aliases for the harbor lmeval args get/set model/base_url respectively. You can manage the rest of the --model_args in a dictionary format as follows:

# See the help on working with dict args
harbor lmeval args -h

# See current args
harbor lmeval args ls
harbor lmeval args

# Get arg value
harbor lmeval args get model

# Set arg value
harbor lmeval args set model $(harbor vllm model)

Here is a list of sample args from the official docs:

# "model" is the model ID that is sent to the API
harbor lmeval args set model llama3.1:8b

# "base_url" is the URL of the API, from _within_
# the container, use "harbor url -i" to get it
harbor lmeval args set base_url $(harbor url -i ollama)

# "num_concurrent" is the number of concurrent
# requests to make to the completion API
harbor lmeval args set num_concurrent 4

# "tokenized_requests" is a boolean flag that
harbor lmeval args set tokenized_requests False

# "tokenizer" is the name of the tokenizer to use
harbor lmeval args set tokenizer gpt2

Harbor connects cache and results folders from your host to the lmeval:

# Open cache folder in the file manager
harbor lmeval cache
# Open results folder in the file manager
harbor lmeval results

By default, the results and cache will be specific for every distinct harbor lmeval model you set.

Ollama example

Note

When running with ollama, llamacpp and other services that do not use HuggingFace repo specifier (user/model), you'll need to manually point lmeval to use the correct tokenizer. Most typically, you can use an official repo of the base model as a tokenizer specifier.

# Start ollama
harbor up
# Pick the model to test
harbor ollama ls
# Configure lmeval
harbor lmeval model llama3.1:8b
harbor lmeval api $(harbor url -i ollama)/v1/completions
harbor lmeval args tokenizer set meta-llama/Meta-Llama-3-8B-Instruct

# Run the eval
harbor lmeval --tasks gsm8k --limit 10

llama.cpp example

# Start with llamacpp
harbor up llamacpp
# Pick the model to test
curl -s $(harbor url llamacpp)/v1/models | jq -r '.data[].id'

# Configure lmeval
harbor lmeval model <model id>
harbor lmeval api $(harbor url -i llamacpp)/v1/completions
harbor lmeval args tokenizer set <hf repo id>

# Run the eval
harbor lmeval --tasks gsm8k --limit 10

Home | CLI Reference | Services | Adding New Service | Compatibility

Provide feedback

Saved searches