benchmark server pipeline #1600

horheynm · 2024-02-12T21:34:49Z

server running pipeline for benchmark

Note: Cannot have continuous batching with timer middleware.

Configs

Server side

deepsparse.server --config_file config.yaml

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"
    # kwargs: {"continuous_batch_sizes": [2]}
    middlewares:
      - TimerMiddleware

client side

import requests

url = "http://localhost:5543/v2/models/text_generation-0/benchmark"

obj = {
    "data_type": "dummy",
    "gen_sequence_length": 100,
    "pipeline_kwargs": {},
    "input_schema_kwargs": {}
} 

response = requests.post(url, json=obj)
print(response.json())

Outputs

server

(.venv) george@gpuserver6:~/deepsparse$ deepsparse.server --config_file config.yaml
/home/george/deepsparse/.venv/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (5.2.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
2024-02-12 21:45:33 deepsparse.server.server INFO     Using config: ServerConfig(num_cores=2, num_workers=2, integration=None, engine_thread_pinning='core', pytorch_num_threads=1, endpoints=[EndpointConfig(name='text_generation-0', route=None, task='text_generation', model='hf:mgoin/TinyStories-1M-ds', batch_size=1, logging_config=PipelineSystemLoggingConfig(enable=True, inference_details=SystemLoggingGroup(enable=False, target_loggers=[]), prediction_latency=SystemLoggingGroup(enable=True, target_loggers=[])), data_logging=None, bucketing=None, middlewares=['TimerMiddleware'], kwargs={})], loggers={}, system_logging=ServerSystemLoggingConfig(enable=True, 
...
'/docs/oauth2-redirect', '/redoc', '/', '/config', '/v2/health/live', '/v2/health/ready', '/v2', '/endpoints', '/endpoints', '/v2/models/text_generation-0/infer', '/v2/models/text_generation-0/benchmark', '/v2/models/text_generation-0', '/v2/models/text_generation-0/ready']
INFO:     Started server process [3930990]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5543 (Press CTRL+C to quit)
2024-02-12 21:45:36 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
INFO:     127.0.0.1:48914 - "POST /v2/models/text_generation-0/benchmark HTTP/1.1" 200 OK

client

(.venv) george@gpuserver6:~/deepsparse$ python3 -m scratch.server
...
'PrepareGeneration': [0.0017719268798828125], 'GenerateNewTokenOperator': [7.486343383789062e-05, 7
'CompileGeneratedTokens': [1.5974044799804688e-05, 1.4781951904296875e-05, 1.358 ...
...

(.venv) george@gpuserver6:~/deepsparse$

…epsparse into benchmark-pipeline-server

dbogunowicz

Can we have some tests in?

dsikka

Great job!

Lets test with an openai example to make sure the integration works
Add testing, specifically around benchmark_pipeline and server integration
Refactor benchmark_pipeline function to prevent repeat code

src/deepsparse/benchmark/benchmark_pipeline.py

…epsparse into benchmark-pipeline-server

horheynm · 2024-03-04T15:33:48Z

Great job!

Lets test with an openai example to make sure the integration works

Add testing, specifically around benchmark_pipeline and server integration

Refactor benchmark_pipeline function to prevent repeat code

I addressed tests and refactor, but not testing with openai - talked to Ben, dont need it for now

benchmark server pipeline

9ccf39c

horheynm requested review from bfineran and dsikka and removed request for bfineran February 12, 2024 21:47

horheynm added 3 commits February 13, 2024 12:45

Merge branch 'main' into benchmark-pipeline-server

cd4a07a

pass tests

0a6e5d4

Merge branch 'benchmark-pipeline-server' of github.com:neuralmagic/de…

82b1717

…epsparse into benchmark-pipeline-server

dbogunowicz previously approved these changes Feb 26, 2024

View reviewed changes

dsikka requested changes Feb 26, 2024

View reviewed changes

dsikka reviewed Feb 26, 2024

View reviewed changes

src/deepsparse/benchmark/benchmark_pipeline.py Show resolved Hide resolved

horheynm added 2 commits March 4, 2024 10:19

Merge branch 'main' into benchmark-pipeline-server

ef3296e

comments

628d4f1

horheynm dismissed dbogunowicz’s stale review via 628d4f1 March 4, 2024 15:28

horheynm force-pushed the benchmark-pipeline-server branch 2 times, most recently from a1d0237 to 628d4f1 Compare March 4, 2024 15:31

horheynm added 3 commits March 4, 2024 15:31

comments

92544de

Merge branch 'main' into benchmark-pipeline-server

8083904

Merge branch 'benchmark-pipeline-server' of github.com:neuralmagic/de…

094a293

…epsparse into benchmark-pipeline-server

horheynm added 2 commits March 4, 2024 15:56

raise watning using middleware + cont.batching.sched

a425828

Merge branch 'main' into benchmark-pipeline-server

8e493fb

dsikka approved these changes Mar 6, 2024

View reviewed changes

Merge branch 'main' into benchmark-pipeline-server

46ea13d

dbogunowicz approved these changes Mar 6, 2024

View reviewed changes

horheynm merged commit acf190c into main Mar 6, 2024
13 checks passed

horheynm deleted the benchmark-pipeline-server branch March 6, 2024 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark server pipeline #1600

benchmark server pipeline #1600

horheynm commented Feb 12, 2024 •

edited

Loading

dbogunowicz left a comment

dsikka left a comment

horheynm commented Mar 4, 2024 •

edited

Loading

benchmark server pipeline #1600

benchmark server pipeline #1600

Conversation

horheynm commented Feb 12, 2024 • edited Loading

Configs

Server side

client side

Outputs

server

client

dbogunowicz left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

horheynm commented Mar 4, 2024 • edited Loading

horheynm commented Feb 12, 2024 •

edited

Loading

horheynm commented Mar 4, 2024 •

edited

Loading