Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Branch][DeepSparse Evaluation API] Update lm-eval, perplexity, additional datasets #1580

Merged
merged 36 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
6035536
initial implementation
dbogunowicz Jan 29, 2024
53cb9ec
initial commit
dbogunowicz Jan 30, 2024
6599f41
add some more tests for hardening
dbogunowicz Jan 30, 2024
4721c1f
Update src/deepsparse/evaluation/cli.py
dbogunowicz Jan 30, 2024
1247794
Update src/deepsparse/transformers/pipelines/text_generation/pipeline.py
dbogunowicz Jan 30, 2024
9e88f89
Apply suggestions from code review
dbogunowicz Jan 30, 2024
fdb21c6
quality
dbogunowicz Jan 30, 2024
be80132
Merge branch 'main' into feature/damian/ui_improvements
dbogunowicz Jan 30, 2024
5d40b8d
Merge remote-tracking branch 'origin/main' into feature/damian/fix_lm…
dbogunowicz Jan 31, 2024
3e5b7a8
fix the UI, implement loglikelihood function
dbogunowicz Feb 1, 2024
ff0944b
Merge branch 'main' into feature/damian/fix_lm_eval
dbogunowicz Feb 1, 2024
f38f0db
remove unneccessary file
dbogunowicz Feb 1, 2024
dd45493
Merge branch 'feature/damian/fix_lm_eval' of github.com:neuralmagic/d…
dbogunowicz Feb 1, 2024
cd10b92
Merge branch 'main' into feature/damian/ui_improvements
dbogunowicz Feb 1, 2024
b2aad17
initial commit
dbogunowicz Feb 2, 2024
35454a1
tests passing, refactor time!
dbogunowicz Feb 2, 2024
d3b84f8
cleanup
dbogunowicz Feb 2, 2024
e7d8c31
Update test_evaluator.py
dbogunowicz Feb 5, 2024
a148fc5
finished
dbogunowicz Feb 5, 2024
3b5977b
rebase
dbogunowicz Feb 5, 2024
a9e9847
quality
dbogunowicz Feb 5, 2024
787ee45
rebase
dbogunowicz Feb 5, 2024
b5a6d6d
manual testing
dbogunowicz Feb 5, 2024
d0698e7
Merge remote-tracking branch 'origin/main' into feature/damian/genera…
dbogunowicz Feb 5, 2024
e10f0c9
UI improvements
dbogunowicz Feb 5, 2024
48a5900
new UI adaptations
dbogunowicz Feb 6, 2024
44e3e6e
make test more lightweight
dbogunowicz Feb 6, 2024
abb6ab8
fix tests 2
dbogunowicz Feb 6, 2024
79fd7e0
Merge branch 'main' into feature/damian/generate_until
dbogunowicz Feb 7, 2024
e5aad65
good point Michael
dbogunowicz Feb 7, 2024
06302dc
Merge branch 'main' into feature/damian/generate_until
dbogunowicz Feb 8, 2024
d65cac6
Return to the name `lm-evaluation-harness` but add alias `lm-eval-har…
dbogunowicz Feb 8, 2024
e0b4f36
Merge branch 'main' into feature/damian/generate_until
dbogunowicz Feb 9, 2024
b82b49b
[DeepSparse Evaluation API] Perplexity (#1555)
dbogunowicz Feb 9, 2024
d4cdd98
Merge branch 'main' into feature/damian/generate_until
dbogunowicz Feb 9, 2024
7a3ad2f
move the registration of the perplexity eval function where it belongs
dbogunowicz Feb 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 14 additions & 24 deletions src/deepsparse/evaluation/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
Module for evaluating models on the various evaluation integrations

OPTIONS:
--target TARGET A path to a remote or local directory containing ONNX/torch model
MODEL_PATH
A path to an ONNX model, local directory containing ONNX model
(including all the auxiliary files) or a SparseZoo stub
-d DATASET, --dataset DATASET
The dataset to evaluate on. The user may pass multiple datasets
Expand All @@ -30,9 +31,7 @@
integration name that is registered in the evaluation registry
-e ENGINE_TYPE, --engine_type ENGINE_TYPE
Inference engine to use for the evaluation. The default
is the DeepSparse engine. If the evaluation should be run
without initializing a pipeline (e.g. for the evaluation
of a torch model), the engine type should be set to None
is the DeepSparse engine.
-s SAVE_PATH, --save_path SAVE_PATH
The path to save the evaluation results.
By default the results will be saved in the
Expand Down Expand Up @@ -73,7 +72,7 @@

from deepsparse.evaluation.evaluator import evaluate
from deepsparse.evaluation.results import Result, save_result
from deepsparse.evaluation.utils import args_to_dict, get_save_path
from deepsparse.evaluation.utils import get_save_path, parse_kwarg_tuples
from deepsparse.operators.engine_operator import (
DEEPSPARSE_ENGINE,
ORT_ENGINE,
Expand All @@ -89,12 +88,10 @@
ignore_unknown_options=True,
)
)
@click.option(
"--target",
@click.argument(
"model_path",
type=click.Path(dir_okay=True, file_okay=True),
required=True,
help="A path to a remote or local directory containing ONNX/torch model "
"(including all the auxiliary files) or a SparseZoo stub",
)
@click.option(
"-d",
Expand All @@ -118,9 +115,7 @@
type=click.Choice([DEEPSPARSE_ENGINE, ORT_ENGINE, TORCHSCRIPT_ENGINE]),
default=DEEPSPARSE_ENGINE,
help="The engine to use for the evaluation. The default is the "
"DeepSparse engine. If the evaluation should be run without "
"initializing a pipeline (e.g. for the evaluation of a torch "
"model), the engine type should be set to None",
"DeepSparse engine. ",
)
@click.option(
"-s",
Expand Down Expand Up @@ -167,7 +162,7 @@
)
@click.argument("integration_args", nargs=-1, type=click.UNPROCESSED)
def main(
target,
model_path,
dataset,
integration,
engine_type,
Expand All @@ -181,16 +176,11 @@ def main(
# join datasets to a list if multiple datasets are passed
datasets = list(dataset) if not isinstance(dataset, str) else dataset
# format kwargs to a dict
integration_args = args_to_dict(integration_args)
integration_args = parse_kwarg_tuples(integration_args)

_LOGGER.info(f"Target to evaluate: {target}")
if engine_type:
_LOGGER.info(f"A pipeline with the engine type: {engine_type} will be created")
else:
_LOGGER.info(
"No engine type specified. The target "
"will be evaluated using the native framework"
)
_LOGGER.info(
f"Creating {engine_type} pipeline to evaluate from model path: {model_path}"
)

_LOGGER.info(
f"Datasets to evaluate on: {datasets}\n"
Expand All @@ -201,7 +191,7 @@ def main(
)

result: Result = evaluate(
target=target,
model=model_path,
datasets=datasets,
integration=integration,
engine_type=engine_type,
Expand All @@ -211,7 +201,7 @@ def main(
**integration_args,
)

_LOGGER.info(f"Evaluation done. Results:\n{result}")
_LOGGER.info(f"Evaluation done. Results:\n{result.formatted}")

save_path = get_save_path(
save_path=save_path,
Expand Down
35 changes: 22 additions & 13 deletions src/deepsparse/evaluation/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Any, List, Optional, Union
from pathlib import Path
from typing import List, Optional, Union

from deepsparse import Pipeline
from deepsparse.evaluation.registry import EvaluationRegistry
from deepsparse.evaluation.results import Result
from deepsparse.evaluation.utils import create_model_from_target
from deepsparse.evaluation.utils import create_pipeline
from deepsparse.operators.engine_operator import (
DEEPSPARSE_ENGINE,
ORT_ENGINE,
Expand All @@ -30,32 +32,39 @@


def evaluate(
target: Any,
model: Union[Pipeline, Path, str],
datasets: Union[str, List[str]],
integration: Optional[str] = None,
engine_type: Union[
DEEPSPARSE_ENGINE, ORT_ENGINE, TORCHSCRIPT_ENGINE, None
DEEPSPARSE_ENGINE, ORT_ENGINE, TORCHSCRIPT_ENGINE
] = DEEPSPARSE_ENGINE,
batch_size: int = 1,
splits: Union[List[str], str, None] = None,
metrics: Union[List[str], str, None] = None,
**kwargs,
) -> Result:

# if target is a string, turn it into an appropriate model/pipeline
# otherwise assume it is a model/pipeline
model = (
create_model_from_target(target, engine_type)
if isinstance(target, str)
else target
if isinstance(model, Pipeline):
_LOGGER.info(
"Passed a Pipeline object into evaluate function. This will "
"override the following arguments:"
)
batch_size = model.batch_size
_LOGGER.info(f"batch_size: {batch_size}")
engine_type = engine_type
_LOGGER.info(f"engine_type: {engine_type}")

# if target is a string, turn it into an appropriate pipeline
# otherwise assume it is a pipeline
pipeline = (
create_pipeline(model, engine_type) if isinstance(model, (Path, str)) else model
)

eval_integration = EvaluationRegistry.resolve(model, datasets, integration)
eval_integration = EvaluationRegistry.resolve(pipeline, datasets, integration)

return eval_integration(
model=model,
pipeline=pipeline,
datasets=datasets,
engine_type=engine_type,
batch_size=batch_size,
splits=splits,
metrics=metrics,
Expand Down
3 changes: 1 addition & 2 deletions src/deepsparse/evaluation/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ def try_import_lm_evaluation_harness(raise_error=False):
if raise_error:
raise ImportError(
"Unable to import lm_eval. "
"To install run 'pip install "
"git+https://github.com/EleutherAI/lm-evaluation-harness@b018a7d51'"
"To install run 'pip install lm-eval==0.4.0'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when or how will this error during normal use if raise_error=False by default? once the eval actually begins?

Copy link
Contributor Author

@dbogunowicz dbogunowicz Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Good point. Yes, I will change the default behavior of this function, and set raise_error to True.

This is the intended behavior when the acual eval is being ran. At runtime, when the user intends to use lm-eval, the module will try to do the hot import of the lm-eval. If it fails to find the dependency, installed, it will raise the error.

However, when testing, I do not want to raise errors, but use the output of this function (boolean) to skip the tests that require lm-eval installed.

)
return False

Expand Down
Loading