Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llama + sd models to iree test suite #126

Merged
merged 16 commits into from
Mar 27, 2024
Merged

add llama + sd models to iree test suite #126

merged 16 commits into from
Mar 27, 2024

Conversation

saienduri
Copy link
Contributor

@saienduri saienduri commented Mar 26, 2024

This commit plugs in turbine tank exported mlir to the iree testing framework for llama + sd models.
Couldn't add real weight test for llama. Real weight file is 20GB and crashing the runner when trying.
On that note, looks like splat test cases aren't running at the moment?
Maybe we can enable the real weight test after we get the cluster.
Also, just in case it's useful for anyone adding models in future, for the exported model mlir, I had to update real weight flag to --parameters=model=real_weights.irpa to get it working.

@saienduri saienduri changed the title add llama model to iree test suite add llama + sd models to iree test suite Mar 26, 2024
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can you also review iree-org/iree#16801 please? Then once this lands we can start testing these programs on IREE presubmit too.

Comment on lines +4 to +5
"name": "splats",
"runtime_flagfile": "splat_data_flags.txt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, looks like splat test cases aren't running at the moment?

The test code is filtering to just real weight tests (using -k real_weights) right now:

- name: "Running real weight model tests"
run: |
source ${VENV_DIR}/bin/activate
pytest iree_tests -n auto -k real_weights -rpfE --timeout=60 --retries 2 --retry-delay 5 --durations=0

I consider the test_iree.yml in this repo to be a sanity check and demonstration of the test suite itself, not a source of truth for test coverage. iree-org/iree#16801 adds similar tests in the IREE repo, and that is what "really matters" (will run on presubmit and should be kept passing). To that end, we can add whatever new filtering and test cases we want here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks for the info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of how brittle the tests are in this repo (by using the latest IREE nightlies instead of a pinned version), you may want to sync past 28091b0 to get a clean CI run on this PR

Comment on lines +9 to +10
"name": "real_weights",
"runtime_flagfile": "real_weights_data_flags.txt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests timed out (and then were retried... which then ignored the timeout... that's an odd interaction, pytest plugins :P)

https://github.com/nod-ai/SHARK-TestSuite/actions/runs/8433834658/job/23095725951?pr=126#step:10:26

How about increasing the timeout here to 600 seconds?

- name: "Running real weight model tests"
run: |
source ${VENV_DIR}/bin/activate
pytest iree_tests -n auto -k real_weights -rpfE --timeout=60 --retries 2 --retry-delay 5 --durations=0
(that's what I have it set to in iree-org/iree#16801 over in the IREE repo)

Comment on lines +8 to +13
"azure_account_url": "https://sharkpublic.blob.core.windows.net",
"azure_container_name": "sharkpublic",
"azure_base_blob_name": "sai/sd-vae-decode-tank/",
"files": [
"sd-vae-decode-tank.mlirbc"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm considering collapsing these back down to just URLs like
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd-vae-decode-tank/sd-vae-decode-tank.mlirbc
I was trying to be fancy with Azure downloading and saving a few characters in .json files but making the URLs easy to copy/paste and wget is going to be nicer for developers

Copy link
Contributor Author

@saienduri saienduri Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool. This is also pretty nice though. Just being able to see the file names instead of the cluttered urls can be nice for anyone taking a look. Maybe we can add a comment to each test like: https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd-vae-decode-tank is the base url. If interested in downloading the file simply wget base url + /{file_name}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I at least consider what's in the json files today to be very rough V0 code. So I wouldn't treat it as sacred if you find it tricky to work with.

Another ideas was to group files for each test case using tar/zip, so we wouldn't need to list them all explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh that would be much easier for adding the tests

Comment on lines +3 to +7
--expected_output=1x93x32000xf32=@inference_output.0.bin
--expected_output=1x32x93x128xf32=@inference_output.1.bin
--expected_output=1x32x93x128xf32=@inference_output.2.bin
--expected_output=1x32x93x128xf32=@inference_output.3.bin
--expected_output=1x32x93x128xf32=@inference_output.4.bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah that's a lot of large outputs. Do we need to be checking all of them?

Is this coming from Turbine, or torchscript / "direct"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from Turbine. These are just the tensor outputs of a llama forward pass (https://huggingface.co/docs/transformers/v4.39.1/en/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast). Specifically, it is the logits and past_key_values tensors. We can ignore the past_key_value tensors and just look at logits. Let me know if I used the ignored outputs syntax correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if I used the ignored outputs syntax correctly.

You may need to repeat --expected_output="(ignored)" for each output (could try with a smaller test case to confirm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a couple different variations (complains about no shape being provided and when I did, it just complained about being unable to parse arg). Maybe come back to it when we actually enable llama real weight test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯\_(ツ)_/¯

Maybe come back to it when we actually enable llama real weight test?

SGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, are we good to land this?

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding naming, is "turbine tank" different from https://github.com/nod-ai/SHARK-TestSuite/tree/main/e2eshark/pytorch/models ? Is that https://github.com/nod-ai/SHARK-Turbine/tree/main/models/turbine_models/turbine_tank ?

Eventually we should have scripts like https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/pytorch/models/import_from_e2eshark.py or guides like https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests#pytorch-models for each source of model files.

Just looking ahead a bit to whenever the MLIR and input/output files need to be regenerated. We want enough either automated or at least documented such that anyone can update the files*

* if they have write access to the cloud projects, but some github actions secrets + automation can also help there too

@saienduri
Copy link
Contributor Author

saienduri commented Mar 27, 2024

Yeah, it is different (https://github.com/nod-ai/SHARK-TestSuite/blob/main/turbine_tank/flows_util.py), but these models are custom and turbine specifc (https://github.com/nod-ai/SHARK-Turbine/blob/main/models/turbine_models/custom_models/sd_inference/unet_runner.py). I want to get that process fully automated (Pull in the latest passing mlir nightly from the turbinetank azure storage). Not sure if we can have a guide for adding turbine tank models though because I had to mess around with the model runners itself, so that it generates the input/output .bin files. The rest of the process (converting mlir to mlirbc and safetensors to splats, real_weights irpas) seems to already be covered.

@saienduri saienduri merged commit d097807 into main Mar 27, 2024
1 check passed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this should have been checked in using git lfs, based on

*.irpa filter=lfs diff=lfs merge=lfs -text
*.mlirbc filter=lfs diff=lfs merge=lfs -text

I see a warning when I pull this:

Encountered 6 file(s) that should have been pointers, but weren't:
        iree_tests/pytorch/models/llama-tank/splats.irpa
        iree_tests/pytorch/models/sd-clip-tank/sd-clip-tank.mlirbc
        iree_tests/pytorch/models/sd-clip-tank/splats.irpa
        iree_tests/pytorch/models/sd-unet-tank/sd-unet-tank.mlirbc
        iree_tests/pytorch/models/sd-unet-tank/splats.irpa
        iree_tests/pytorch/models/sd-vae-decode-tank/splats.irpa

I'm also learning how to use git lfs, so I guess I'll get back to you when I know how to resolve that and stop it from happening again...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #147 to fix

ScottTodd added a commit that referenced this pull request Apr 2, 2024
Follow-up to
#126 (comment)

All I needed to do for generating this PR was pull changes (having git
lfs installed). I think to avoid this in the future, others working with
these files should also install git lfs.
renxida pushed a commit that referenced this pull request Jul 18, 2024
This commit plugs in turbine tank exported mlir to the iree testing
framework for llama + sd models.
Couldn't add real weight test for llama. Real weight file is 20GB and
crashing the runner when trying.
On that note, looks like splat test cases aren't running at the moment?
Maybe we can enable the real weight test after we get the cluster.
Also, just in case it's useful for anyone adding models in future, for
the exported model mlir, I had to update real weight flag to
`--parameters=model=real_weights.irpa` to get it working.
renxida pushed a commit that referenced this pull request Jul 18, 2024
Follow-up to
#126 (comment)

All I needed to do for generating this PR was pull changes (having git
lfs installed). I think to avoid this in the future, others working with
these files should also install git lfs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants