GitHub - goshaQ/inference-optimizer: A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model

inference-optimizer is a simple tool for optimization of TensorFlow computation graphs. It relies on TensorFlow's built-in realization of optimization algorithms, in particural Graph Transform Tool. In addition, it provides a convinient way to evaluate the obtained optimized version and compare performance metrics with the unoptimized one using a benchmark. Currently only models with ConvNet architecture performing classification task are supported, so there is no guarantee that this tool will work with any other type of network.

Getting started

These instructions will get you a copy of the project up and running on your local machine.

Installing

git clone https://github.com/goshaQ/inference-optimizer

cd inference-optimizer/

Running

bash scripts/optimize_and_benchmark.sh

Optimization process

Here we provide brief description of different transformations that take place during the optimization process. Please see links provided below for more information.

All nodes that are not used during inference (i.e. training-only) are removed.
The remaining nodes that are used but are useless during inference are removed.
Any sub-graph within the model that always evaluate to constant expression is replaced with these constant.
Redundant multiplications introduced after batch normalization are eliminated.
All floating point constants are converted into eight-bit equivalents.
All calculation nodes are replaced with their eight-bit equivalents (if available).

Results

We tested our tool on three ConvNet models pre-trained on the ImageNet dataset. For inference we used 10 batches 128 images each selected from the ImageNet evaluation dataset. The statistics shown below were gathered by running the benchmark once on the following system:

OS Platform: Linux Ubuntu 16.04
TensorFlow version: 1.9.0
TensorFlow installed from: Source (with SIMD instructions support)
Python version: 3.6.6
CPU model & RAM: Intel Core i5-6200U & 8GB
GPU model: None

Inception V3

	Before	After	Comment
Inference Time	128.7s	425.8s	3.3x slower
Accuracy	0.890%	0.849%	0.041% less accurate
Image per Second	7.5	2.3	3.3x less throughput
Graph Size	95mb	25mb	3.8x lighter

ResNet V2 152

	Before	After	Comment
Inference Time	460.5s	1532.0s	3.3x slower
Accuracy	0.877%	0.636%	0.241% less accurate
Image per Second	2.1	0.6	3.5x less throughput
Graph Size	242mb	64mb	3.8x lighter

PNASNet-5 Large 331

	Before	After	Comment
Inference Time	623.0s	1686.4s	2.7x slower
Accuracy	0.908%	0.901%	0.007% less accurate
Image per Second	1.5	0.6	2.5x less throughput
Graph Size	346mb	90mb	3.8x lighter

Note that we encountered significant inference time degradation. There are already several related issues on TensorFlow's GitHub page (i.e. #2807, #13939) which indicates the shortcomings of Graph Transform Tool.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Installing

Running

Optimization process

Results

Inception V3

ResNet V2 152

PNASNet-5 Large 331

For more information

About

Releases

Packages

Languages

goshaQ/inference-optimizer

Folders and files

Latest commit

History

Repository files navigation

Getting started

Installing

Running

Optimization process

Results

Inception V3

ResNet V2 152

PNASNet-5 Large 331

For more information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages