[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

adrianlizarraga · 2024-10-15T04:28:23Z

Description

Adds QNN provider option offload_graph_io_quantization to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior.

Motivation and Context

Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

…ntization to the CPU EP

…tput)

include/onnxruntime/core/session/onnxruntime_c_api.h

onnxruntime/test/providers/qnn/qnn_basic_test.cc

onnxruntime/test/qnn_ctx_gen/command_args_parser.cc

…tization to the CPU EP (#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

…tization to the CPU EP (microsoft#22436) ### Description Adds QNN provider option `offload_graph_io_quantization` to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior. ### Motivation and Context Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

adrianlizarraga added 5 commits October 14, 2024 21:04

Add option to offload graph input quantization and graph output dequa…

e7f7903

…ntization to the CPU EP

Add new qnn provider option to other tools

8e38243

Separate into two different options (one for input and another for ou…

d7150e5

…tput)

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

ccc0225

Use specific ep assignment enum in unit test

776a236

adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Oct 15, 2024

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

cdcfaa0

adrianlizarraga requested review from HectorSVC and jywu-msft October 15, 2024 16:29

adrianlizarraga marked this pull request as ready for review October 15, 2024 16:30

Log warning if user also disabled cpu ep fallback

db551d1

HectorSVC reviewed Oct 16, 2024

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Outdated Show resolved Hide resolved

adrianlizarraga added 2 commits October 15, 2024 22:19

Use a single option

0a60ef9

Merge branch 'main' into adrianl/qnn-offload-io-quant-dequant-to-cpu

3d364f3

adrianlizarraga commented Oct 16, 2024

View reviewed changes

onnxruntime/test/providers/qnn/qnn_basic_test.cc Outdated Show resolved Hide resolved

Update onnxruntime/test/providers/qnn/qnn_basic_test.cc

780deff

adrianlizarraga commented Oct 16, 2024

View reviewed changes

onnxruntime/test/qnn_ctx_gen/command_args_parser.cc Outdated Show resolved Hide resolved

Update onnxruntime/test/qnn_ctx_gen/command_args_parser.cc

aef2ae9

HectorSVC approved these changes Oct 16, 2024

View reviewed changes

jywu-msft approved these changes Oct 16, 2024

View reviewed changes

jywu-msft merged commit 84d48b6 into main Oct 16, 2024
91 checks passed

jywu-msft deleted the adrianl/qnn-offload-io-quant-dequant-to-cpu branch October 16, 2024 22:00

sophies927 added the release:1.20.0 label Oct 17, 2024

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

adrianlizarraga commented Oct 15, 2024 •

edited

Loading

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

Conversation

adrianlizarraga commented Oct 15, 2024 • edited Loading

Description

Motivation and Context

adrianlizarraga commented Oct 15, 2024 •

edited

Loading