Fix executorch kv cache incompatibility with to_executorch lowering #7279

dvorjackz · 2024-12-11T00:44:35Z

Summary

Fix the Llama 3.2 vision text decoder prefill issue by marking the kv cache as an initialized mutable buffer in a custom pass

Test plan

Add kv_cache export tests that test accuracy against torchtune eager and verify contents of the cache after prefill and token-by-token generation
Export and run full Llama 3.2 vision text decoder

> python -m examples.models.llama.export_llama --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json  --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" -d fp32 --verbose --max_seq_length 64 -k
> python -m examples.models.llama3_2_vision.runner.native --model llama3_2_vision --pte llama3_2_vision.pte  --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model --prompt "Who's the founder of Meta?" --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --max_len 64 -kv --temperature 0

pytorch-bot · 2024-12-11T00:44:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7279

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit e297c9b with merge base 343aa0c ():

NEW FAILURE - The following job has failed:

pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t b90672cd8630d1544e16c7c60d45fd7508caeb7cb8b1bc44652f5f7c7e8f80fe /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

extension/llm/export/builder.py

extension/llm/modules/test/test_kv_cache.py

exir/passes/cache_pos_init_mutable_pass.py

tarun292 · 2024-12-21T02:13:18Z

exir/passes/init_mutable_pass.py

+        for pattern in self.patterns:
+            if pattern in name:
+                meta["et_init_buffer"] = True
+


We should raise an exception here if we don't find the buffer.

Add tests that localize the prefill issue to the kv cache

aac90a0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024

dvorjackz marked this pull request as draft December 11, 2024 00:44

dvorjackz added the topic: not user facing label Dec 11, 2024

Fixes test but not model

917fb0d

dvorjackz force-pushed the jz/fix-prefill branch 2 times, most recently from d538d43 to ee2eb15 Compare December 16, 2024 21:08

Updated pass

46ea733

dvorjackz force-pushed the jz/fix-prefill branch from ee2eb15 to 46ea733 Compare December 16, 2024 21:08

dvorjackz added 2 commits December 17, 2024 18:38

Fix segmentation fault

5db136c

Lint

9cdfb43

dvorjackz force-pushed the jz/fix-prefill branch 2 times, most recently from 5dcb8f7 to f723fe1 Compare December 18, 2024 05:40

Only add pass when vision model

9e68531

dvorjackz force-pushed the jz/fix-prefill branch from f723fe1 to 9e68531 Compare December 18, 2024 05:41

Add comments

925409d

dvorjackz marked this pull request as ready for review December 18, 2024 06:11

dvorjackz requested review from tarun292, JacobSzwejbka and lucylq December 18, 2024 06:11

dvorjackz added 2 commits December 17, 2024 22:12

Remove import

2a3fe8b

Add pass

61101c2

dvorjackz changed the title ~~[DRAFT] Fix executorch kv cache incompatibility with to_executorch lowering~~ Fix executorch kv cache incompatibility with to_executorch lowering Dec 18, 2024

lucylq reviewed Dec 18, 2024

View reviewed changes

extension/llm/export/builder.py Outdated Show resolved Hide resolved

extension/llm/modules/test/test_kv_cache.py Show resolved Hide resolved

tarun292 reviewed Dec 19, 2024

View reviewed changes

exir/passes/cache_pos_init_mutable_pass.py Outdated Show resolved Hide resolved

PR review

4ee95d3

dvorjackz requested review from tarun292 and lucylq December 21, 2024 00:40

Fix test

e297c9b

tarun292 reviewed Dec 21, 2024

View reviewed changes

iseeyuan approved these changes Dec 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Fix executorch kv cache incompatibility with to_executorch lowering #7279

dvorjackz commented Dec 11, 2024 •

edited

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

tarun292 Dec 21, 2024

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Are you sure you want to change the base?

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Conversation

dvorjackz commented Dec 11, 2024 • edited Loading

Summary

Test plan

pytorch-bot bot commented Dec 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7279

❌ 1 New Failure

tarun292 Dec 21, 2024

Choose a reason for hiding this comment

dvorjackz commented Dec 11, 2024 •

edited

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading