DISB is a DNN Inference Serving Benchmark with diverse workloads and models. It was originally designed to simulate real-time scenarios, e.g. autonomous driving systems, where both low latency and high throughput are demanded.
DISB uses the client-server architecture, where the clients send the DNN inference requests to the server via RPC, and the server returns the inference result. Clients can submit the inference requests periodically or randomly. An inference request may contain the model name (or id), the input data and other customized attributes (e.g., priority or deadline).
Note: Please use git lfs to clone this repo in order to download model files.
- DISB Toolkit
- DISB Workloads
- Build & Install
- Usage
- Samples
- Benchmark Results
- Paper
- The Team
- Contact Us
- License
DISB provides a C++ library (libdisb
) to perform benchmarking. To integrate your own DNN inference system with DISB, you only need to implement DISB::Client
to wrap your inference interface. See usage for details.
Currently, DISB provides 5 workloads with different DNN models and different number of clients.
There are three pattern for submitting inference requests in DISB clients:
- Uniform Distribution (U): The client sends inference requests periodically, with a fixed frequency (e.g., 20 reqs/s). This pattern is common in data-driven applications (e.g., obstacle detection with cameras).
- Poisson Distribution (P): The client sends inference requests in a Poisson distribution pattern with a given average arrival speed (e.g., 25 reqs/s). This pattern can simulate event-driven applications (e.g., speech recoginition).
- Closed-loop (C): The client continuously sends inference requests, which simulates a contention load.
- Trace (T): The client sends inference requests according to a given trace file which contains a series of request time points. This pattern can reproduce real world workloads.
- Dependent (D): The client sends inference requests when all prior tasks have completed, prior tasks can be other clients. This pattern can simulate inference graph (or inference DAG), where a model need the output of another model as its input.
We combined these patterns into 6 typical workloads for benchmarks, see workloads for workload details.
[TBD] We're still working on providing more representative and general DNN inference serving workloads.
Install dependencies:
sudo apt install build-essential cmake
sudo apt install libjsoncpp-dev
Build and install DISB tools:
# will build and install into disb/install
make build
-
DISB::Client
is an adaptor class between DISB and the serving backend. You can implement the following interfaces in its subclass. These interfaces will be called during the benchmark, and their execution time will be recorded by DISB.# init() will be called once when the benchmark begins virtual void init(); # The following interfaces will be called by DISB # within each inference request during benchmark. # Average latency of each interface will be recorded. virtual void prepareInput(); virtual void preprocess(); virtual void copyInput(); virtual void infer(); virtual void copyOutput(); virtual void postprocess(); # If another task dependents on this client, # the InferResult will be passed to the next task. virtual std::shared_ptr<InferResult> produceResult();
-
DISB::Load
instructs when DISB should launch the next inference request. There are 5 built-in loads simulating the load patterns mentioned in DISB Workloads. They can be enabled by setting certain attributes in json configuration, see HelloDISB for example.If you want to use
DISB::DependentLoad
, your client class should inheritDISB::DependentClient
and implement the virtual methodsconsumePrevResults()
andproduceDummyPrevResults()
.consumePrevResults()
will be called when one of the prior tasks finished one inference and produced one result. You can use the previous results as the input of the DependentClient. You can also inheritDISB::InferResult
to pass custom data.produceDummyPrevResults()
will be called when DISBis warming up and testing the standalone latency of each client. The results will be consumed by
consumePrevResults()
, making a dependent load become independent in order to measure standalone latency. -
DISB::BenchmarkSuite
should be created and initialized before the benchmark is launched.void init(const std::string &configJsonStr, std::shared_ptr<Client> clientFactory(const Json::Value &config), std::shared_ptr<Load> loadFactory(const Json::Value &config) = builtinLoadFactory); void run(void loadCoordinator(const std::vector<LoadInfo> &loadInfos) = builtinLoadCoordinator);
When initializing BenchmarkSuite, a json formatted string should be passed as config, and a factory method of your own subclass implementation of
DISB::Client
should be provided. TheJson::Value
passed to the factory method is the"client"
attribute in each task inconfigJsonStr
.If you need customized loads other than the built-in loads, you should implement the virtual method
waitUntilNextLaunch()
and provide your own load factory method. TheJson::Value
passed to the factory method is the"load"
attribute in each task inconfigJsonStr
.If your loads need to coordinate with each other, you can pass
loadCoordinator()
toDISB::BenchmarkSuite::run()
, which makes sure that loads will not conflict with each other. For example, thebuiltinLoadCoordinator()
will prevent the periodic loads with the same frequency and the highest priority from launching at the same time by setting different launch delay. -
DISB::Analyzer
is used to measure the performance of each inference task, each inference task can have multiple analyzers.DISB::BasicAnalyzer
, which can measure latency and throughput, is implemented by DISB and is enabled for every task by default.If you want customized analyzers other than
DISB::BasicAnalyzer
, for example, an analyzer that measures gpu usage and memory consumption, the following interfaces should be implemented.virtual void init(); virtual void start(const std::chrono::system_clock::time_point &beginTime); virtual void stop(const std::chrono::system_clock::time_point &endTime); virtual std::shared_ptr<DISB::Record> produceRecord() = 0; virtual void consumeRecord(std::shared_ptr<DISB::Record> record); virtual Json::Value generateReport() = 0; // The following event callback will be invoked before // the corresponding method of DISB::Client is invoked. virtual void onPrepareInput(std::shared_ptr<DISB::Record> record); virtual void onPreprocess(std::shared_ptr<DISB::Record> record); virtual void onCopyInput(std::shared_ptr<DISB::Record> record); virtual void onInfer(std::shared_ptr<DISB::Record> record); virtual void onCopyOutput(std::shared_ptr<DISB::Record> record); virtual void onPostprocess(std::shared_ptr<DISB::Record> record);
produceRecord()
will be called before each inference request, and should be implemented. You can return a subclass ofDISB::Record
, which can be customized to store other information. The attributetimePoints
contains begin and end time of each inference phase, includingprepareInput
,preprocess
, etc.timePoints
will be set by DISB while running benchmark. Other information, for example, gpu usage and memory consumption can be stored in your subclass ofDISB::Record
.Lifecycle of each
DISB::Record
:- Created by
DISB::Analyzer::produceRecord()
before each inference request. - Passed to each event callback of
DISB::Analyzer
, here you can store specific infomation you needed into the record. - Consumed by
DISB::Analyzer::consumeRecord()
after an inference request is over.
After you have implemented
DISB::Analyzer
, you can add it to aDISB::Client
by callingDISB::Client::addAnalyzer()
in the factory method of client. You may refer to TensorRT sample or Tensorflow Serving sample for more details. They both implement anAccuarcyAnalyzer
to measure inference accuarcy. - Created by
-
#include "disb.h" class HelloClient: public DISB::Client { // your implementation } std::shared_ptr<DISB::Client> helloClientFactory(const Json::Value &config) { return std::make_shared<HelloClient>(config["name"].asString()); } int main(int argc, char** argv) { if (argc != 2) { std::cout << "Usage: hellodisb config.json" << std::endl; return -1; } DISB::BenchmarkSuite benchmark; std::string jsonStr = readStringFromFile(argv[1]); benchmark.init(jsonStr, helloClientFactory); benchmark.run(); Json::Value report = benchmark.generateReport(); std::cout << report << std::endl; return 0; }
-
A simple sample that shows how DISB works, needs no extra dependencies.
We have supported DISB on some mainstream DNN inference serving frameworks, including:
We tested these DNN inference serving frameworks under 6 DISB Workloads. Test results are shown in results.md.
[TBD] We're still working on supporting more DNN inference serving frameworks.
If you use DISB in your research, please cite our paper:
@inproceedings {osdi2022reef,
author = {Mingcong Han and Hanze Zhang and Rong Chen and Haibo Chen},
title = {Microsecond-scale Preemption for Concurrent {GPU-accelerated} {DNN} Inferences},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {539--558},
url = {https://www.usenix.org/conference/osdi22/presentation/han},
publisher = {USENIX Association},
month = jul,
}
DISB is developed and maintained by members from IPADS@SJTU and Shanghai AI Laboratory. See Contributors.
If you have any questions about DISB, feel free to contact us.
Weihang Shen: shenwhang@sjtu.edu.cn
Mingcong Han: mingconghan@sjtu.edu.cn
Rong Chen: rongchen@sjtu.edu.cn
DISB is released under the Apache License 2.0.