Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TorchFX] INT8 Weights Compression Support #2891

Merged
merged 92 commits into from
Sep 26, 2024

Conversation

anzr299
Copy link
Contributor

@anzr299 anzr299 commented Aug 19, 2024

Changes

  1. Added weights compression implementation from template weights compression.
  2. Modified graph builder for torch fx to include edge case where embedding node's weight was not being placed on the right port.
  3. Updated the torch weights compression tests to include FX embedding metatype for reusability of some torch test functions in fx test.

Reason for changes

To support nncf.compress_weights() for Torch Fx models.

Tests

Added test at tests/torch/fx/test_compress_weights.py Reused the models and some tests from the torch implementation and included some extra checks such as the size of compressed model being lower than original model.

Performance:

tinyllama-1.1b-step-50k-105b Inference Speeds:

  • Torch Fx Compressed: 0.963s
  • Torch Fx Compiled with OV backend: 0.074s
  • Torch Fx, Compiled with OV backend and compressed: 0.04s
  • OV FP32: 0.079s
  • OV int8: 0.039s

Constraints

Currently only supports Torch FX representations extracted using the torch._export.capture_pre_autograd_graph(). #2987 outlines the request to support weights compression for FX models extracted using torch.export.export

Tickets:

#2938

@anzr299 anzr299 requested a review from a team as a code owner August 19, 2024 17:25
@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch experimental NNCF PTQ Pull requests that updates NNCF PTQ labels Aug 19, 2024
@daniil-lyakhov daniil-lyakhov self-assigned this Aug 19, 2024
@daniil-lyakhov daniil-lyakhov self-requested a review August 19, 2024 17:46
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! My first row of comments:

nncf/experimental/torch/fx/transformations.py Outdated Show resolved Hide resolved
nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
@daniil-lyakhov daniil-lyakhov self-requested a review August 19, 2024 19:05
@daniil-lyakhov
Copy link
Collaborator

@anzr299, please add the conformance test for the Tinyllama-1.1b model, as it done in this pr
#2636

@anzr299
Copy link
Contributor Author

anzr299 commented Aug 19, 2024

@anzr299, please add the conformance test for the Tinyllama-1.1b model, as it done in this pr #2636

Alright

…g metatype

2. Modify constant update transformation builder to accept input port for the constant node. Default is set to 1
1. inference is performed correctly with compressed model
2. compressed model has same output shape as normal model
3. compressed model output is not very different from normal model
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix reference files structure

tests/torch/fx/test_models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good staff!

for source_node in model.graph.nodes:
node_type, node_metatype = GraphConverter._get_node_type_and_metatype(source_node, model)
node_metatype = GraphConverter._map_fx_unique_metatypes(source_node, node_metatype)
is_shared_node = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_shared_node = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Alright. Done!

def shared_constants_unification_transformation(model: torch.fx.GraphModule):
"""
checks fx graph for shared constants, disconnects and eliminates redundant
shared constant while connecting singular shared constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate a little more which problem this function solves.
From the current description it's not clear why some nodes should be disconnected and eliminated.
Maybe it's worth mentioning the issue with solver and min_max algorithms that they don't use is_shared attribute and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will elaborate the comment.

{"gptq": True},
{"awq": True},
{"scale_estimation": True},
{"lora_correction": True},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subset_size and dataset are also not supported.
If #2978 will be merged before this PR, there's also a backup_precision parameter.

Copy link
Contributor Author

@anzr299 anzr299 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the case of backup_precision, I can leave it for now depending on if the PR is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following @alexsu52's guidance, the subset_size parameter is ignored when the dataset is None in the WeightsCompression Algorithm. For this reason, I didn't add a check for subset_size, but I did include a check for dataset.

Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only minor comments, overall LGTM

@alexsu52 alexsu52 merged commit c93676d into openvinotoolkit:develop Sep 26, 2024
13 checks passed
@MaximProshin MaximProshin changed the title [TorchFX] Weights Compression Support [TorchFX] INT8 Weights Compression Support Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation experimental NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants