Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN EP] Add provider option to offload graph I/O quantization/dequantization to the CPU EP #22436

Merged
merged 11 commits into from
Oct 16, 2024

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Oct 15, 2024

Description

Adds QNN provider option offload_graph_io_quantization to offload graph input quantization and graph output dequantization to the CPU EP. Option is disabled by default to maintain current behavior.

Motivation and Context

Offloading the handling of I/O quantization to the CPU EP significantly improves inference latency for many models.

@adrianlizarraga adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Oct 15, 2024
@adrianlizarraga adrianlizarraga marked this pull request as ready for review October 15, 2024 16:30
@jywu-msft jywu-msft merged commit 84d48b6 into main Oct 16, 2024
91 checks passed
@jywu-msft jywu-msft deleted the adrianl/qnn-offload-io-quant-dequant-to-cpu branch October 16, 2024 22:00
guschmue pushed a commit that referenced this pull request Oct 18, 2024
…tization to the CPU EP (#22436)

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
apsonawane pushed a commit that referenced this pull request Oct 21, 2024
…tization to the CPU EP (#22436)

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
@sophies927 sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Oct 22, 2024
ishwar-raut1 pushed a commit to ishwar-raut1/onnxruntime that referenced this pull request Nov 19, 2024
…tization to the CPU EP (microsoft#22436)

### Description
Adds QNN provider option `offload_graph_io_quantization` to offload
graph input quantization and graph output dequantization to the CPU EP.
Option is disabled by default to maintain current behavior.


### Motivation and Context
Offloading the handling of I/O quantization to the CPU EP significantly
improves inference latency for many models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked Cherry-picked for a cherrypicks branch ep:QNN issues related to QNN exeution provider release:1.20.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants