Truncated GPT model examples for iree debug #371

AmosLewis · 2024-10-15T23:26:43Z

AmosLewis · 2024-10-15T23:28:52Z

@zjgarvey When python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t mygpt4 Get bug in construct_inputs.log:

Failed test at stage construct_inputs with exception:
input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'
Traceback (most recent call last):
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 224, in run_tests
    inputs = inst.construct_inputs()
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/framework.py", line 78, in construct_inputs
    return get_sample_inputs_for_onnx_model(self.model, self.dim_param_dict)
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in get_sample_inputs_for_onnx_model
    tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in <listcomp>
    tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 62, in generate_input_from_node
    int_dims = get_node_shape_from_dim_param_dict(node, dim_param_dict)
  File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 36, in get_node_shape_from_dim_param_dict
    raise ValueError(f"input node {node.name} has a dim param='{dim}' not found in provided dim_param_dict: '{dim_param_dict}'")
ValueError: input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'

zjgarvey · 2024-10-16T00:09:13Z

Yeah you need to specify dim params. Some examples in migraphx and nlp

AmosLewis · 2024-10-16T00:27:34Z

@PhaneeshB
With model generate by register_test(t_model_constructor(1, ""), "mygpt4"):

python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir 
iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir 

iree-run-module --trace_execution=true --print_statistics=true  --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1" --input="1x4xsi32=1" --input="1x4xsi
32=1"      
EXEC @tf2onnx
[module.tf2onnx+00000000]    <block>
[module.tf2onnx+00000001]    %r3 = vm.const.ref.zero
[module.tf2onnx+00000004]    %i0 = vm.const.i32 -1  // 0xFFFFFFFF
[module.tf2onnx+0000000B]    %i1 = vm.const.i32.zero
[module.tf2onnx+0000000E]    %r4 = vm.call @hal.devices.get(%i1(0))
[module.tf2onnx+0000001C]    %r4 = vm.call @hal.fence.create(%r4(!hal.device/0x0x55ead365c970), %i1(0))
[module.tf2onnx+0000002C]    vm.call @module.tf2onnx$async(%r0(!hal.buffer_view/0x0x55ead365d610), %r1(!hal.buffer_view/0x0x55ead365d740), %r2(!hal.buffer_view/0x0x55ead365d870), %r3(null), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+00000000]    <block>
[module.tf2onnx$async+00000001]    vm.call @hal.fence.signal(%r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+0000000C]    vm.return 
[module.tf2onnx+00000040]    %i0 = vm.call.varadic @hal.fence.await(%i0(4294967295), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx+00000056]    vm.return 
[[ iree_hal_allocator_t memory statistics ]]
  HOST_LOCAL:            0B peak /            0B allocated /            0B freed /            0B live
DEVICE_LOCAL:           48B peak /           48B allocated /           48B freed /            0B live

vmfb generated successfully.

iree commit

commit 9f93073e0c5442dbb67262bd29edb37cd2c1e3b8 (HEAD -> main, upstream/main)
Author: Maksim Levental <maksim.levental@gmail.com>
Date:   Tue Oct 15 12:26:14 2024 -0700

    [CMake] Don't update compile definitions for imported targets for MSCV (#18766)

torch-mlir commit:

commit 45bb17ebfe5e9cdcfd2cfabf850d9dec7127c5ab (HEAD -> main, upstream/main)
Author: Justin Ngo <justin.ngo@arm.com>
Date:   Tue Oct 15 08:38:02 2024 -0700

    [TOSA] Add legalization for empty, scatter, slice_scatter, diag_embed (#3792)

zjgarvey · 2024-10-16T13:53:38Z

Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.

AmosLewis · 2024-10-16T16:18:47Z

Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.

No, the model is in customer's google drive.

AmosLewis · 2024-10-16T16:25:07Z

With raw model download from google drive, compile error here:

(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ unzip model.onnx.zip 
Archive:  model.onnx.zip
  inflating: model.onnx              
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ ls
model.onnx  model.onnx.zip
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir                                                        
(e2e_venv) ➜  mygpt4 git:(mygpt) ✗ iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir
iree-compile: /proj/gdba/shark/chi/src/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2420: llvm::LogicalResult legalizeUnresolvedMaterialization(mlir::RewriterBase &, (anonymous namespace)::UnresolvedMaterializationRewrite *): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.      Program arguments: iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libIREECompiler.so 0x00007f307cfd8e5d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
1  libIREECompiler.so 0x00007f307cfd934b
2  libIREECompiler.so 0x00007f307cfd7376 llvm::sys::RunSignalHandlers() + 134
3  libIREECompiler.so 0x00007f307cfd9b65
4  libc.so.6          0x00007f3070306520
5  libc.so.6          0x00007f307035a9fc pthread_kill + 300
6  libc.so.6          0x00007f3070306476 raise + 22
7  libc.so.6          0x00007f30702ec7f3 abort + 211
8  libc.so.6          0x00007f30702ec71b
9  libc.so.6          0x00007f30702fde96
10 libIREECompiler.so 0x00007f3086675b6e
11 libIREECompiler.so 0x00007f3086674a84 mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) + 1188
12 libIREECompiler.so 0x00007f3086678749 mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 105
13 libIREECompiler.so 0x00007f308667884d mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 125
14 libIREECompiler.so 0x00007f3080b3a593
15 libIREECompiler.so 0x00007f307d44791b
16 libIREECompiler.so 0x00007f307d4478b5
17 libIREECompiler.so 0x00007f307cee3459
18 libIREECompiler.so 0x00007f307d44aa9d
19 libIREECompiler.so 0x00007f307d442e33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 851
20 libIREECompiler.so 0x00007f307d4433e4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 388
21 libIREECompiler.so 0x00007f307d444ecc mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) + 108
22 libIREECompiler.so 0x00007f307d444def mlir::PassManager::run(mlir::Operation*) + 1151
23 libIREECompiler.so 0x00007f307ce20e9a
24 libIREECompiler.so 0x00007f307ce20773 ireeCompilerInvocationPipeline + 35
25 libIREECompiler.so 0x00007f307d3c81ae
26 libIREECompiler.so 0x00007f307d3c75de
27 libIREECompiler.so 0x00007f307ce7388b ireeCompilerRunMain + 27
28 iree-compile       0x000055a454eb77b2
29 libc.so.6          0x00007f30702edd90
30 libc.so.6          0x00007f30702ede40 __libc_start_main + 128
31 iree-compile       0x000055a454eb76c5
[1]    188732 IOT instruction (core dumped)  iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb >

AmosLewis · 2024-10-16T17:12:54Z

@PhaneeshB Get some bug fix. Now we can get the model run to the first shape op run successfully. Next to locate the op that fail.

python ./run.py --mode=cl-onnx-iree -v -t mygpt4_trunc_shape_1
Stages to be run: ['setup', 'import_model', 'preprocessing', 'compilation', 'construct_inputs', 'native_inference', 'compiled_inference', 'postprocessing']
Test list: ['mygpt4_trunc_shape_1']
running test mygpt4_trunc_shape_1...
        PASSED                               

Test Summary:
        PASSES: 1
        TOTAL: 1
results stored in /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run

Find the bug when debug #371.

AmosLewis requested review from zjgarvey and PhaneeshB October 15, 2024 23:27

AmosLewis force-pushed the mygpt branch 2 times, most recently from dd14789 to e0f4df0 Compare October 16, 2024 00:27

AmosLewis force-pushed the mygpt branch from e0f4df0 to 93bed0b Compare October 16, 2024 17:10

AmosLewis mentioned this pull request Oct 16, 2024

Fix the bug for truncated model #372

Merged

AmosLewis added a commit that referenced this pull request Oct 16, 2024

Fix the bug for truncated model (#372)

ca23af4

Find the bug when debug #371.

Truncated gpt model examples for iree debug

6ef8bd8

AmosLewis force-pushed the mygpt branch from 93bed0b to 6ef8bd8 Compare October 16, 2024 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncated GPT model examples for iree debug #371

Truncated GPT model examples for iree debug #371

AmosLewis commented Oct 15, 2024 •

edited

Loading

AmosLewis commented Oct 15, 2024

zjgarvey commented Oct 16, 2024

AmosLewis commented Oct 16, 2024 •

edited

Loading

zjgarvey commented Oct 16, 2024

AmosLewis commented Oct 16, 2024

AmosLewis commented Oct 16, 2024

AmosLewis commented Oct 16, 2024 •

edited

Loading

Truncated GPT model examples for iree debug #371

Are you sure you want to change the base?

Truncated GPT model examples for iree debug #371

Conversation

AmosLewis commented Oct 15, 2024 • edited Loading

AmosLewis commented Oct 15, 2024

zjgarvey commented Oct 16, 2024

AmosLewis commented Oct 16, 2024 • edited Loading

zjgarvey commented Oct 16, 2024

AmosLewis commented Oct 16, 2024

AmosLewis commented Oct 16, 2024

AmosLewis commented Oct 16, 2024 • edited Loading

AmosLewis commented Oct 15, 2024 •

edited

Loading

AmosLewis commented Oct 16, 2024 •

edited

Loading

AmosLewis commented Oct 16, 2024 •

edited

Loading