-
Notifications
You must be signed in to change notification settings - Fork 0
Home
TensorDSE aims to provide a complete from that allows for a user to map a trained Tensorflow model onto a hardware platform comprised of multiple types of execution units, eg. CPU, GPU and TPU.
The idea was originally undertaken by Ines Ben Hmidda in her master's thesis at TUM,
Abstract: In recent years, machine learning methods have proven their efficiency and applicability in a broad range of fields and environments. However, the complexity and high computational density they require makes their implementation challenging, particularly in constrained environments like embedded systems. Therefore, on-going research has aimed at accelerating the execution of machine learning methods. Some solutions presented in research involve the use of design space exploration (DSE). In this thesis, a method to accelerate the inference of machine learning models by applying state-of-the-art DSE techniques is presented. By representing a machine learning model as an application, and the hardware it is deployed on as the architecture, optimally distributing its execution to accelerate it becomes a system- design level synthesis problem. This problem is modeled according to the Y-chart methodology, which allows the comparison of design points quantitatively according to performance data. Evaluating all design points is challenging when dealing with such complex algorithms. Hence, the design space is explored with an efficient algorithm selected from the existent methods used in DSE. The problem is then represented as an optimization for system-level DSE. The results of the optimization reside in optimal distributions of machine learning workloads to the target hardware architecture. Using SAT-Decoding with an evolutionary algorithm, each proposed solution is guaranteed to be valid and highly optimal. An experimental evaluation was performed as a proof of concept of the theory. The evaluation of each design point during the DSE is carried out with a function computing an estimation of the overall execution time. This estimation is deduced from cost models created by performing benchmarks of operations encapsulated in machine learning models on the hardware. The design space is represented and explored using existing frameworks. The resulting solu- tions generated by the DSE present optimal mappings, which could accelerate the inference if implemented. Thus, it was shown that the application of design space exploration, specifically the Y-chart methodology, at a system-design level, to optimize the distribution of machine learning workloads across heterogeneous hardware systems leads to the acceleration of machine learning models’ execution.
For a complete implementation of the work proposed in the original thesis there were a few other parts required to complete the flow, namely:
- Inference operation times should be more accurately benchmarked
- Communication delays for USB connected TPUs needed to be formulated into the DSE
- Models containing Coral TPU operations must be able to be deployed, according to a generated mapping, to execute specific inference operations on the target execution unit, either the CPU, GPU or TPU.
These works have been developed and a complete flow is in the process of being completed.
Coral TPU USB Communication Benchmarking (Daniel](https://github.com/duclos-cavalcanti))
Found here.
Inference Model Operation Benchmarking (Daniel](https://github.com/duclos-cavalcanti))
Found here.
Mapping Custom Operations of TF Lite models to run on the Coral Edge TPU USB Hardware Accelerator (Ala)
Found here.
OpenDSE Mapping (Alex)
Found here.