Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated GPT model examples for iree debug #371

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AmosLewis
Copy link
Collaborator

@AmosLewis AmosLewis commented Oct 15, 2024

@AmosLewis
Copy link
Collaborator Author

@zjgarvey When python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t mygpt4 Get bug in construct_inputs.log:

Failed test at stage construct_inputs with exception:
input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'
Traceback (most recent call last):
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 224, in run_tests
    inputs = inst.construct_inputs()
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/framework.py", line 78, in construct_inputs
    return get_sample_inputs_for_onnx_model(self.model, self.dim_param_dict)
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in get_sample_inputs_for_onnx_model
    tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in <listcomp>
    tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 62, in generate_input_from_node
    int_dims = get_node_shape_from_dim_param_dict(node, dim_param_dict)
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 36, in get_node_shape_from_dim_param_dict
    raise ValueError(f"input node {node.name} has a dim param='{dim}' not found in provided dim_param_dict: '{dim_param_dict}'")
ValueError: input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'

@zjgarvey
Copy link
Contributor

Yeah you need to specify dim params. Some examples in migraphx and nlp

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 16, 2024

@PhaneeshB
With model generate by register_test(t_model_constructor(1, ""), "mygpt4"):

python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir 
iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir 

iree-run-module --trace_execution=true --print_statistics=true  --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1" --input="1x4xsi32=1" --input="1x4xsi
32=1"      
EXEC @tf2onnx
[module.tf2onnx+00000000]    <block>
[module.tf2onnx+00000001]    %r3 = vm.const.ref.zero
[module.tf2onnx+00000004]    %i0 = vm.const.i32 -1  // 0xFFFFFFFF
[module.tf2onnx+0000000B]    %i1 = vm.const.i32.zero
[module.tf2onnx+0000000E]    %r4 = vm.call @hal.devices.get(%i1(0))
[module.tf2onnx+0000001C]    %r4 = vm.call @hal.fence.create(%r4(!hal.device/0x0x55ead365c970), %i1(0))
[module.tf2onnx+0000002C]    vm.call @module.tf2onnx$async(%r0(!hal.buffer_view/0x0x55ead365d610), %r1(!hal.buffer_view/0x0x55ead365d740), %r2(!hal.buffer_view/0x0x55ead365d870), %r3(null), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+00000000]    <block>
[module.tf2onnx$async+00000001]    vm.call @hal.fence.signal(%r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+0000000C]    vm.return 
[module.tf2onnx+00000040]    %i0 = vm.call.varadic @hal.fence.await(%i0(4294967295), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx+00000056]    vm.return 
[[ iree_hal_allocator_t memory statistics ]]
  HOST_LOCAL:            0B peak /            0B allocated /            0B freed /            0B live
DEVICE_LOCAL:           48B peak /           48B allocated /           48B freed /            0B live

vmfb generated successfully.

iree commit

commit 9f93073e0c5442dbb67262bd29edb37cd2c1e3b8 (HEAD -> main, upstream/main)
Author: Maksim Levental <maksim.levental@gmail.com>
Date:   Tue Oct 15 12:26:14 2024 -0700

    [CMake] Don't update compile definitions for imported targets for MSCV (#18766)

torch-mlir commit:

commit 45bb17ebfe5e9cdcfd2cfabf850d9dec7127c5ab (HEAD -> main, upstream/main)
Author: Justin Ngo <justin.ngo@arm.com>
Date:   Tue Oct 15 08:38:02 2024 -0700

    [TOSA] Add legalization for empty, scatter, slice_scatter, diag_embed (#3792)

@zjgarvey
Copy link
Contributor

Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.

@AmosLewis
Copy link
Collaborator Author

Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.

No, the model is in customer's google drive.

@AmosLewis
Copy link
Collaborator Author

With raw model download from google drive, compile error here:

(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ unzip model.onnx.zip 
Archive:  model.onnx.zip
  inflating: model.onnx              
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ ls
model.onnx  model.onnx.zip
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir                                                        
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir
iree-compile: /proj/gdba/shark/chi/src/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2420: llvm::LogicalResult legalizeUnresolvedMaterialization(mlir::RewriterBase &, (anonymous namespace)::UnresolvedMaterializationRewrite *): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.      Program arguments: iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libIREECompiler.so 0x00007f307cfd8e5d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
1  libIREECompiler.so 0x00007f307cfd934b
2  libIREECompiler.so 0x00007f307cfd7376 llvm::sys::RunSignalHandlers() + 134
3  libIREECompiler.so 0x00007f307cfd9b65
4  libc.so.6          0x00007f3070306520
5  libc.so.6          0x00007f307035a9fc pthread_kill + 300
6  libc.so.6          0x00007f3070306476 raise + 22
7  libc.so.6          0x00007f30702ec7f3 abort + 211
8  libc.so.6          0x00007f30702ec71b
9  libc.so.6          0x00007f30702fde96
10 libIREECompiler.so 0x00007f3086675b6e
11 libIREECompiler.so 0x00007f3086674a84 mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) + 1188
12 libIREECompiler.so 0x00007f3086678749 mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 105
13 libIREECompiler.so 0x00007f308667884d mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 125
14 libIREECompiler.so 0x00007f3080b3a593
15 libIREECompiler.so 0x00007f307d44791b
16 libIREECompiler.so 0x00007f307d4478b5
17 libIREECompiler.so 0x00007f307cee3459
18 libIREECompiler.so 0x00007f307d44aa9d
19 libIREECompiler.so 0x00007f307d442e33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 851
20 libIREECompiler.so 0x00007f307d4433e4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 388
21 libIREECompiler.so 0x00007f307d444ecc mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) + 108
22 libIREECompiler.so 0x00007f307d444def mlir::PassManager::run(mlir::Operation*) + 1151
23 libIREECompiler.so 0x00007f307ce20e9a
24 libIREECompiler.so 0x00007f307ce20773 ireeCompilerInvocationPipeline + 35
25 libIREECompiler.so 0x00007f307d3c81ae
26 libIREECompiler.so 0x00007f307d3c75de
27 libIREECompiler.so 0x00007f307ce7388b ireeCompilerRunMain + 27
28 iree-compile       0x000055a454eb77b2
29 libc.so.6          0x00007f30702edd90
30 libc.so.6          0x00007f30702ede40 __libc_start_main + 128
31 iree-compile       0x000055a454eb76c5
[1]    188732 IOT instruction (core dumped)  iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > 

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 16, 2024

@PhaneeshB Get some bug fix. Now we can get the model run to the first shape op run successfully. Next to locate the op that fail.

python ./run.py --mode=cl-onnx-iree -v -t mygpt4_trunc_shape_1
Stages to be run: ['setup', 'import_model', 'preprocessing', 'compilation', 'construct_inputs', 'native_inference', 'compiled_inference', 'postprocessing']
Test list: ['mygpt4_trunc_shape_1']
running test mygpt4_trunc_shape_1...
        PASSED                               

Test Summary:
        PASSES: 1
        TOTAL: 1
results stored in /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run

AmosLewis added a commit that referenced this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants