Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ONNX export support for GIT #2132

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/exporters/onnx/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Supported architectures from [🤗 Transformers](https://huggingface.co/docs/tra
- ESM
- Falcon
- Flaubert
- GIT
- GPT-2
- GPT-BigCode
- GPT-J
Expand Down
20 changes: 20 additions & 0 deletions optimum/exporters/onnx/model_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2621,3 +2621,23 @@ class EncoderDecoderOnnxConfig(EncoderDecoderBaseOnnxConfig):
NORMALIZED_CONFIG_CLASS = NormalizedEncoderDecoderConfig

DEFAULT_ONNX_OPSET = 14 # uses SDPA in Transformers, hence opset>=14.


class GITOnnxConfig(VisionOnnxConfig):
NORMALIZED_CONFIG_CLASS = NormalizedVisionConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue you're reporting ValueError: Input image size (64*64) doesn't match model (32*32). should be fixed if you replace the config with :

Suggested change
NORMALIZED_CONFIG_CLASS = NormalizedVisionConfig
NormalizedTextAndVisionConfig.with_args(vision_config="vision_config")

as it seems that for GitConfig the image_size attribute needs to be taken from the vision_config directly https://github.com/huggingface/transformers/blob/504c4d36929b6bb8a8c2ecfad0f2625f4075f22a/src/transformers/models/git/configuration_git.py#L98.

What is currently happening is that before export this value is not found in the config and is default to 64

if normalized_config.has_attribute("image_size"):
when it should be set to 32 in your case https://huggingface.co/hf-internal-testing/tiny-random-GitModel/blob/main/config.json#L52

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, DummyVisionInputGenerator)

@property
def inputs(self) -> Dict[str, Dict[int, str]]:
return {
"input_ids": {0: "text_batch_size", 1: "sequence_length"},
"pixel_values": {0: "image_batch_size", 1: "num_channels", 2: "height", 3: "width"}
}


class GITVisionModelOnnxConfig(VisionOnnxConfig):
NORMALIZED_CONFIG_CLASS = NormalizedVisionConfig

@property
def inputs(self) -> Dict[str, Dict[int, str]]:
return {"pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case should be included in GITOnnxConfig directly depending on self.task no ? if not then in which case should the model be exported with input_ids as input?

Suggested change
class GITVisionModelOnnxConfig(VisionOnnxConfig):
NORMALIZED_CONFIG_CLASS = NormalizedVisionConfig
@property
def inputs(self) -> Dict[str, Dict[int, str]]:
return {"pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to use self.task in marcindulak@ea2321c

Problems:

1. ValueError: You have to specify either input_ids or inputs_embeds
optimum-cli export onnx --model microsoft/git-base /tmp/git-base
image-to-text <class 'optimum.utils.input_generators.DummyVisionInputGenerator'>
Traceback (most recent call last):
  File "/root/venv/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/root/venv/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 265, in run
    main_export(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 373, in main_export
    onnx_export_from_model(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 1176, in onnx_export_from_model
    _, onnx_outputs = export_models(
                      ^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 762, in export_models
    export(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 867, in export
    export_output = export_pytorch(
                    ^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 563, in export_pytorch
    onnx_export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/__init__.py", line 375, in export
    export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 997, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 904, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/jit/_trace.py", line 1500, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/jit/_trace.py", line 139, in forward
    graph, out = torch._C._create_graph_by_tracing(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/jit/_trace.py", line 130, in wrapper
    outs.append(self.inner(*trace_inputs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1726, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/model_patcher.py", line 151, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py", line 1570, in forward
    outputs = self.git(
              ^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1726, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py", line 1276, in forward
    raise ValueError("You have to specify either input_ids or inputs_embeds")
ValueError: You have to specify either input_ids or inputs_embeds

It looks like all three types of tasks "feature-extraction", "image-text-to-text", "image-to-text" want "input_ids" as input. Could it be due to the use of TextAndVisionOnnxConfig as the base class?

class GITOnnxConfig(TextAndVisionOnnxConfig):
    NORMALIZED_CONFIG_CLASS = NormalizedTextAndVisionConfig.with_args(vision_config="vision_config")
2. We don't have an op for aten::full but it isn't a special case. Argument types: int[], bool, NoneType, NoneType, Device, bool
optimum-cli export onnx --model hf-internal-testing/tiny-random-GitForCausalLM /tmp/tiny-random-GitForCausalLM
image-text-to-text <class 'optimum.utils.input_generators.DummyTextInputGenerator'>
image-text-to-text <class 'optimum.utils.input_generators.DummyVisionInputGenerator'>
/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py:685: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py:695: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if interpolate_pos_encoding:
/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py:768: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/root/venv/lib/python3.11/site-packages/transformers/models/git/modeling_git.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
Traceback (most recent call last):
  File "/root/venv/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/root/venv/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 265, in run
    main_export(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 373, in main_export
    onnx_export_from_model(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 1176, in onnx_export_from_model
    _, onnx_outputs = export_models(
                      ^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 762, in export_models
    export(
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 867, in export
    export_output = export_pytorch(
                    ^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 563, in export_pytorch
    onnx_export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/__init__.py", line 375, in export
    export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 997, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/onnx/utils.py", line 904, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/jit/_trace.py", line 1500, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib/python3.11/site-packages/torch/jit/_trace.py", line 139, in forward
    graph, out = torch._C._create_graph_by_tracing(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/ir/alias_analysis.cpp":617, please report a bug to PyTorch. We don't have an op for aten::full but it isn't a special case.  Argument types: int[], bool, NoneType, NoneType, Device, bool, 

Candidates:
	aten::full.names(int[] size, Scalar fill_value, *, str[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
	aten::full(SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
	aten::full.names_out(int[] size, Scalar fill_value, *, str[]? names, Tensor(a!) out) -> Tensor(a!)
	aten::full.out(SymInt[] size, Scalar fill_value, *, Tensor(a!) out) -> Tensor(a!)

Is this the case of pytorch/pytorch#137202
pytorch/pytorch#130229, or some misconfiguration?

11 changes: 11 additions & 0 deletions optimum/exporters/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,17 @@ class TasksManager:
"text-classification",
onnx="GemmaOnnxConfig",
),
"git": supported_tasks_mapping(
"feature-extraction",
"image-text-to-text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue
KeyError: "Unknown task: image-text-to-text. Possible values are: `audio-classification` for AutoModelForAudioClassification, `audio-frame-classification` for AutoModelForAudioFrameClassification, `audio-xvector` for AutoModelForAudioXVector, `automatic-speech-recognition` for ('AutoModelForSpeechSeq2Seq', 'AutoModelForCTC'), `depth-estimation` for AutoModelForDepthEstimation, `feature-extraction` for AutoModel, `fill-mask` for AutoModelForMaskedLM, `image-classification` for AutoModelForImageClassification, `image-segmentation` for ('AutoModelForImageSegmentation', 'AutoModelForSemanticSegmentation'), `image-to-image` for AutoModelForImageToImage, `image-to-text` for ('AutoModelForVision2Seq', 'AutoModel'), `mask-generation` for AutoModel, `masked-im` for AutoModelForMaskedImageModeling, `multiple-choice` for AutoModelForMultipleChoice, `object-detection` for AutoModelForObjectDetection, `question-answering` for AutoModelForQuestionAnswering, `reinforcement-learning` for AutoModel, `semantic-segmentation` for AutoModelForSemanticSegmentation, `text-to-audio` for ('AutoModelForTextToSpectrogram', 'AutoModelForTextToWaveform'), `text-generation` for AutoModelForCausalLM, `text2text-generation` for AutoModelForSeq2SeqLM, `text-classification` for AutoModelForSequenceClassification, `token-classification` for AutoModelForTokenClassification, `zero-shot-image-classification` for AutoModelForZeroShotImageClassification, `zero-shot-object-detection` for AutoModelForZeroShotObjectDetection"

comes from the fact that we don't yet support the "image-text-to-text" task but can be added here

_TRANSFORMERS_TASKS_TO_MODEL_LOADERS = {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"image-to-text",
onnx="GITOnnxConfig",
),
"git-vision-model": supported_tasks_mapping(
"feature-extraction",
"image-to-text",
onnx="GITVisionModelOnnxConfig",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this model type doesn't exist so can be removed

Suggested change
"git-vision-model": supported_tasks_mapping(
"feature-extraction",
"image-to-text",
onnx="GITVisionModelOnnxConfig",
),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in marcindulak@ea2321c

I imagined that git-vision-model is expected since there is a separate clip-vision-model.
The docs show CLIPVisionModel
https://huggingface.co/docs/transformers/main/model_doc/clip#transformers.CLIPVisionModel
and GitVisionModel
https://huggingface.co/docs/transformers/main/model_doc/git#transformers.GitVisionModel
so I thought the setup will be similar.

I see git_vision_model in https://huggingface.co/microsoft/git-large/blob/main/config.json, but it's nested under vision_config. Is this the reason why there is no separate OnnxConfig needed?

"glpn": supported_tasks_mapping(
"feature-extraction",
"depth-estimation",
Expand Down
8 changes: 8 additions & 0 deletions tests/exporters/exporters_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,14 @@
},
"flaubert": "hf-internal-testing/tiny-random-flaubert",
"gemma": "fxmarty/tiny-random-GemmaForCausalLM",
"git": {
"hf-internal-testing/tiny-random-GitModel": [
"feature-extraction",
],
"hf-internal-testing/tiny-random-GitForCausalLM": [
"image-text-to-text",
],
},
"glpn": "hf-internal-testing/tiny-random-GLPNModel",
"gpt2": "hf-internal-testing/tiny-random-gpt2",
"gpt-bigcode": "hf-internal-testing/tiny-random-GPTBigCodeModel",
Expand Down
8 changes: 8 additions & 0 deletions tests/onnxruntime/utils_onnxruntime_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,14 @@
"flaubert": "hf-internal-testing/tiny-random-flaubert",
"flux": "optimum-internal-testing/tiny-random-flux",
"gemma": "fxmarty/tiny-random-GemmaForCausalLM",
"git": {
"hf-internal-testing/tiny-random-GitModel": [
"feature-extraction",
],
"hf-internal-testing/tiny-random-GitForCausalLM": [
"image-text-to-text",
],
},
"gpt2": "hf-internal-testing/tiny-random-gpt2",
"gpt_bigcode": "hf-internal-testing/tiny-random-GPTBigCodeModel",
"gpt_neo": "hf-internal-testing/tiny-random-GPTNeoModel",
Expand Down