Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark server pipeline #1600

Merged
merged 12 commits into from
Mar 6, 2024
Merged

benchmark server pipeline #1600

merged 12 commits into from
Mar 6, 2024

Conversation

horheynm
Copy link
Member

@horheynm horheynm commented Feb 12, 2024

server running pipeline for benchmark

Note: Cannot have continuous batching with timer middleware.

Configs

Server side

deepsparse.server --config_file config.yaml
num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"
    # kwargs: {"continuous_batch_sizes": [2]}
    middlewares:
      - TimerMiddleware

client side

import requests

url = "http://localhost:5543/v2/models/text_generation-0/benchmark"

obj = {
    "data_type": "dummy",
    "gen_sequence_length": 100,
    "pipeline_kwargs": {},
    "input_schema_kwargs": {}
} 

response = requests.post(url, json=obj)
print(response.json())

Outputs

server

(.venv) george@gpuserver6:~/deepsparse$ deepsparse.server --config_file config.yaml
/home/george/deepsparse/.venv/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (5.2.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
2024-02-12 21:45:33 deepsparse.server.server INFO     Using config: ServerConfig(num_cores=2, num_workers=2, integration=None, engine_thread_pinning='core', pytorch_num_threads=1, endpoints=[EndpointConfig(name='text_generation-0', route=None, task='text_generation', model='hf:mgoin/TinyStories-1M-ds', batch_size=1, logging_config=PipelineSystemLoggingConfig(enable=True, inference_details=SystemLoggingGroup(enable=False, target_loggers=[]), prediction_latency=SystemLoggingGroup(enable=True, target_loggers=[])), data_logging=None, bucketing=None, middlewares=['TimerMiddleware'], kwargs={})], loggers={}, system_logging=ServerSystemLoggingConfig(enable=True, 
...
'/docs/oauth2-redirect', '/redoc', '/', '/config', '/v2/health/live', '/v2/health/ready', '/v2', '/endpoints', '/endpoints', '/v2/models/text_generation-0/infer', '/v2/models/text_generation-0/benchmark', '/v2/models/text_generation-0', '/v2/models/text_generation-0/ready']
INFO:     Started server process [3930990]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5543 (Press CTRL+C to quit)
2024-02-12 21:45:36 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
INFO:     127.0.0.1:48914 - "POST /v2/models/text_generation-0/benchmark HTTP/1.1" 200 OK

client

(.venv) george@gpuserver6:~/deepsparse$ python3 -m scratch.server
...
'PrepareGeneration': [0.0017719268798828125], 'GenerateNewTokenOperator': [7.486343383789062e-05, 7
'CompileGeneratedTokens': [1.5974044799804688e-05, 1.4781951904296875e-05, 1.358 ...
...

(.venv) george@gpuserver6:~/deepsparse$ 

@horheynm horheynm requested review from bfineran and dsikka and removed request for bfineran February 12, 2024 21:47
dbogunowicz
dbogunowicz previously approved these changes Feb 26, 2024
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests in?

Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

  • Lets test with an openai example to make sure the integration works
  • Add testing, specifically around benchmark_pipeline and server integration
  • Refactor benchmark_pipeline function to prevent repeat code

@horheynm horheynm force-pushed the benchmark-pipeline-server branch 2 times, most recently from a1d0237 to 628d4f1 Compare March 4, 2024 15:31
@horheynm
Copy link
Member Author

horheynm commented Mar 4, 2024

Great job!

  • Lets test with an openai example to make sure the integration works
  • Add testing, specifically around benchmark_pipeline and server integration
  • Refactor benchmark_pipeline function to prevent repeat code

I addressed tests and refactor, but not testing with openai - talked to Ben, dont need it for now

@horheynm horheynm merged commit acf190c into main Mar 6, 2024
13 checks passed
@horheynm horheynm deleted the benchmark-pipeline-server branch March 6, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants