-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TorchFX] INT8 Weights Compression Support #2891
[TorchFX] INT8 Weights Compression Support #2891
Conversation
TODO: Fix edge data for get_attr node
…raph to include post operator hook insertion for constant nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! My first row of comments:
nncf/quantization/algorithms/weight_compression/torch_fx_backend.py
Outdated
Show resolved
Hide resolved
nncf/quantization/algorithms/weight_compression/torch_fx_backend.py
Outdated
Show resolved
Hide resolved
nncf/quantization/algorithms/weight_compression/torch_fx_backend.py
Outdated
Show resolved
Hide resolved
…g metatype 2. Modify constant update transformation builder to accept input port for the constant node. Default is set to 1
1. inference is performed correctly with compressed model 2. compressed model has same output shape as normal model 3. compressed model output is not very different from normal model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix reference files structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good staff!
for source_node in model.graph.nodes: | ||
node_type, node_metatype = GraphConverter._get_node_type_and_metatype(source_node, model) | ||
node_metatype = GraphConverter._map_fx_unique_metatypes(source_node, node_metatype) | ||
is_shared_node = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_shared_node = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Alright. Done!
def shared_constants_unification_transformation(model: torch.fx.GraphModule): | ||
""" | ||
checks fx graph for shared constants, disconnects and eliminates redundant | ||
shared constant while connecting singular shared constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate a little more which problem this function solves.
From the current description it's not clear why some nodes should be disconnected and eliminated.
Maybe it's worth mentioning the issue with solver
and min_max
algorithms that they don't use is_shared
attribute and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I will elaborate the comment.
{"gptq": True}, | ||
{"awq": True}, | ||
{"scale_estimation": True}, | ||
{"lora_correction": True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subset_size
and dataset
are also not supported.
If #2978 will be merged before this PR, there's also a backup_precision
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the case of backup_precision
, I can leave it for now depending on if the PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following @alexsu52's guidance, the subset_size parameter is ignored when the dataset is None in the WeightsCompression Algorithm. For this reason, I didn't add a check for subset_size, but I did include a check for dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only minor comments, overall LGTM
Changes
Reason for changes
To support nncf.compress_weights() for Torch Fx models.
Tests
Added test at
tests/torch/fx/test_compress_weights.py
Reused the models and some tests from the torch implementation and included some extra checks such as the size of compressed model being lower than original model.Performance:
tinyllama-1.1b-step-50k-105b Inference Speeds:
Constraints
Currently only supports Torch FX representations extracted using the
torch._export.capture_pre_autograd_graph()
. #2987 outlines the request to support weights compression for FX models extracted usingtorch.export.export
Tickets:
#2938