-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add llama + sd models to iree test suite #126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Can you also review iree-org/iree#16801 please? Then once this lands we can start testing these programs on IREE presubmit too.
"name": "splats", | ||
"runtime_flagfile": "splat_data_flags.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, looks like splat test cases aren't running at the moment?
The test code is filtering to just real weight tests (using -k real_weights
) right now:
SHARK-TestSuite/.github/workflows/test_iree.yml
Lines 75 to 78 in 73dcb42
- name: "Running real weight model tests" | |
run: | | |
source ${VENV_DIR}/bin/activate | |
pytest iree_tests -n auto -k real_weights -rpfE --timeout=60 --retries 2 --retry-delay 5 --durations=0 |
I consider the test_iree.yml
in this repo to be a sanity check and demonstration of the test suite itself, not a source of truth for test coverage. iree-org/iree#16801 adds similar tests in the IREE repo, and that is what "really matters" (will run on presubmit and should be kept passing). To that end, we can add whatever new filtering and test cases we want here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks for the info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of how brittle the tests are in this repo (by using the latest IREE nightlies instead of a pinned version), you may want to sync past 28091b0 to get a clean CI run on this PR
"name": "real_weights", | ||
"runtime_flagfile": "real_weights_data_flags.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests timed out (and then were retried... which then ignored the timeout... that's an odd interaction, pytest plugins :P)
https://github.com/nod-ai/SHARK-TestSuite/actions/runs/8433834658/job/23095725951?pr=126#step:10:26
How about increasing the timeout here to 600 seconds?
SHARK-TestSuite/.github/workflows/test_iree.yml
Lines 75 to 78 in 73dcb42
- name: "Running real weight model tests" | |
run: | | |
source ${VENV_DIR}/bin/activate | |
pytest iree_tests -n auto -k real_weights -rpfE --timeout=60 --retries 2 --retry-delay 5 --durations=0 |
"azure_account_url": "https://sharkpublic.blob.core.windows.net", | ||
"azure_container_name": "sharkpublic", | ||
"azure_base_blob_name": "sai/sd-vae-decode-tank/", | ||
"files": [ | ||
"sd-vae-decode-tank.mlirbc" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I'm considering collapsing these back down to just URLs like
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd-vae-decode-tank/sd-vae-decode-tank.mlirbc
I was trying to be fancy with Azure downloading and saving a few characters in .json files but making the URLs easy to copy/paste and wget
is going to be nicer for developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool. This is also pretty nice though. Just being able to see the file names instead of the cluttered urls can be nice for anyone taking a look. Maybe we can add a comment to each test like: https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd-vae-decode-tank
is the base url. If interested in downloading the file simply wget base url + /{file_name}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I at least consider what's in the json files today to be very rough V0 code. So I wouldn't treat it as sacred if you find it tricky to work with.
Another ideas was to group files for each test case using tar/zip, so we wouldn't need to list them all explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh that would be much easier for adding the tests
--expected_output=1x93x32000xf32=@inference_output.0.bin | ||
--expected_output=1x32x93x128xf32=@inference_output.1.bin | ||
--expected_output=1x32x93x128xf32=@inference_output.2.bin | ||
--expected_output=1x32x93x128xf32=@inference_output.3.bin | ||
--expected_output=1x32x93x128xf32=@inference_output.4.bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah that's a lot of large outputs. Do we need to be checking all of them?
Is this coming from Turbine, or torchscript / "direct"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming from Turbine. These are just the tensor outputs of a llama forward pass (https://huggingface.co/docs/transformers/v4.39.1/en/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast). Specifically, it is the logits and past_key_values tensors. We can ignore the past_key_value tensors and just look at logits. Let me know if I used the ignored outputs syntax correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I used the ignored outputs syntax correctly.
You may need to repeat --expected_output="(ignored)"
for each output (could try with a smaller test case to confirm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried a couple different variations (complains about no shape being provided and when I did, it just complained about being unable to parse arg). Maybe come back to it when we actually enable llama real weight test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
¯\_(ツ)_/¯
Maybe come back to it when we actually enable llama real weight test?
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, are we good to land this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding naming, is "turbine tank" different from https://github.com/nod-ai/SHARK-TestSuite/tree/main/e2eshark/pytorch/models ? Is that https://github.com/nod-ai/SHARK-Turbine/tree/main/models/turbine_models/turbine_tank ?
Eventually we should have scripts like https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/pytorch/models/import_from_e2eshark.py or guides like https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests#pytorch-models for each source of model files.
Just looking ahead a bit to whenever the MLIR and input/output files need to be regenerated. We want enough either automated or at least documented such that anyone can update the files*
* if they have write access to the cloud projects, but some github actions secrets + automation can also help there too
Yeah, it is different (https://github.com/nod-ai/SHARK-TestSuite/blob/main/turbine_tank/flows_util.py), but these models are custom and turbine specifc (https://github.com/nod-ai/SHARK-Turbine/blob/main/models/turbine_models/custom_models/sd_inference/unet_runner.py). I want to get that process fully automated (Pull in the latest passing mlir nightly from the turbinetank azure storage). Not sure if we can have a guide for adding turbine tank models though because I had to mess around with the model runners itself, so that it generates the input/output .bin files. The rest of the process (converting mlir to mlirbc and safetensors to splats, real_weights irpas) seems to already be covered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this should have been checked in using git lfs, based on
SHARK-TestSuite/iree_tests/.gitattributes
Lines 1 to 2 in 9464d50
*.irpa filter=lfs diff=lfs merge=lfs -text | |
*.mlirbc filter=lfs diff=lfs merge=lfs -text |
I see a warning when I pull this:
Encountered 6 file(s) that should have been pointers, but weren't:
iree_tests/pytorch/models/llama-tank/splats.irpa
iree_tests/pytorch/models/sd-clip-tank/sd-clip-tank.mlirbc
iree_tests/pytorch/models/sd-clip-tank/splats.irpa
iree_tests/pytorch/models/sd-unet-tank/sd-unet-tank.mlirbc
iree_tests/pytorch/models/sd-unet-tank/splats.irpa
iree_tests/pytorch/models/sd-vae-decode-tank/splats.irpa
I'm also learning how to use git lfs, so I guess I'll get back to you when I know how to resolve that and stop it from happening again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent #147 to fix
Follow-up to #126 (comment) All I needed to do for generating this PR was pull changes (having git lfs installed). I think to avoid this in the future, others working with these files should also install git lfs.
This commit plugs in turbine tank exported mlir to the iree testing framework for llama + sd models. Couldn't add real weight test for llama. Real weight file is 20GB and crashing the runner when trying. On that note, looks like splat test cases aren't running at the moment? Maybe we can enable the real weight test after we get the cluster. Also, just in case it's useful for anyone adding models in future, for the exported model mlir, I had to update real weight flag to `--parameters=model=real_weights.irpa` to get it working.
Follow-up to #126 (comment) All I needed to do for generating this PR was pull changes (having git lfs installed). I think to avoid this in the future, others working with these files should also install git lfs.
This commit plugs in turbine tank exported mlir to the iree testing framework for llama + sd models.
Couldn't add real weight test for llama. Real weight file is 20GB and crashing the runner when trying.
On that note, looks like splat test cases aren't running at the moment?
Maybe we can enable the real weight test after we get the cluster.
Also, just in case it's useful for anyone adding models in future, for the exported model mlir, I had to update real weight flag to
--parameters=model=real_weights.irpa
to get it working.