Enable GPTQModel (#2064) · huggingface/optimum@21de42f

Commit

Enable GPTQModel (#2064)
* align gptq check to transformers for supporting cpu

* fix comment

* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* compatible with auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix compatible with auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix compatible with auto-gptq linear

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format  (#1)

* need checkpoint_format

* default value of checkpoint_format is gptq

* fix quantize

* fix quantize

* fix quantize

* Update quantizer.py

* need convert to v1 before gptqmodel save

* back checkpoint_format to gptq after convert

* cleanup code

* sym=False is not supported with auto-gptq

* add comments

* cleanup code

* Update quantizer.py

* always convert v2 to v1 if checkpoint_format = "gptq"

* Update quantizer.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Mod backend code (#2)

* keep gptq_v2 if sym is false

* use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init

* no need check backend

* use device_map

* cleanup

* Update quantizer.py

* move import

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format and log

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update check quant type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix optimum compat (#3)

* add meta info

* cleanup

* cleanup

* The value of quantizer should be an array

* Update quantizer.py

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* Update quantizer.py

* cleanup

* comment on meta

* hf_select_quant_linear pass checkpoint_format

* add todo fix

* move convert code to quantizer.save()

* Update quantizer.py

* Optimize hf_convert_gptq_v2_to_v1_format()

* Optimize hf_convert_gptq_v1_to_v2_format()

* fix GPTQTestCUDA

* hf_select_quant_linear() always set pack=True

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* GPTQQuantizer add backend

* lower checkpoint_format and backend

* cleanup

* move backend to bottom

* no need to check gptqmodel version for ipex support

* Update import_utils.py

* Update quantizer.py

* fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value

* make version var short

* Update import_utils.py

* fix unittest

* use assertLessEqual

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: LRL <lrl@lbx.dev>

* fix format and convert v2 to v1

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* [Fix] all tensors not same device (#5)

* fix device error

* update gptqmodel version

* fix test

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add gptqmodel tests which contains cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix all auto-gptq tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm gptqmodel yaml

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable real cpu tests by fp32

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test model name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* keep the original device setting when using auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>