INC PyTorch 3.x API Design #1527
Unanswered
xin3he
asked this question in
Show and tell
Replies: 2 comments
-
Load APIAPI designfrom @xin3he: def load(model_name_or_path, original_model=None, format='default', device='cpu', **kwargs):
"""
Paramters:
model_name_or_path - if 'format' is set to 'huggingface', it means the huggingface del_name_or_path.
if 'format' is set to 'default', it means the 'checkpoint_dir'
parameter should not be None. it coworks with 'original_model' parameter
to load INC quantized INT8/FP8 model in local.
original_model - optional, only needed if 'format' is set to 'default'.
It co-works with 'model_name_or_path' paramter load INC quantized INT8/FP8 model in local..
For TorchScript model, original_model is not required.
format - 'default' or 'huggingface'. support huggingface model or INC quantized model.
device - 'cpu', 'hpu' or 'gpu'. specify the device the model will be loaded to.
kwargs - For `huggingface.from_pretrained` API required parameters,
such as 'trust_remote_code', 'revision'.
""" Benefits of this deisgn:
Usage demo
Related PR |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
INC PyTorch 3.x API Design
Target
Main principles
quantize
andautotune
are user interface APIs for quantization. One is a one-time quantization, the other requires a set of configurations.GPTQConfig
and autotune will use a set of configurations.Repo Architecture
BF16
/FP16
/FP8
quantize
,autotune
are imported here.algorithms
folder.algorithms
folder.fp8
/ipex
/weight_only
folder.WeightOnlyLinear
.fetch_modules
.GGML_TYPE_Q4_K
.Previous Design
IPEX StaticQuant & SmoothQuant
PyTorch Weight-only Quantization
New Design
IPEX StaticQuant & SmoothQuant
Configuration
The argument to config is data or a list of data. If the parameters can be assembled into different configurations, the returned obj will be a list of configurations used for autotuning.
Quantize Interface
PyTorch Weight-only Quantization
Configuration
Quantize Interface
Beta Was this translation helpful? Give feedback.
All reactions