Kleidi Integration #5162

mcr229 · 2024-09-07T08:18:42Z

Bringing KleidiAI QB4 Kernels to ExecuTorch

KleidiAI has released QB4 Kernels which pack the activation while dynamically quantizating to improve performance of the gemm kernel. We leverage these kernels through XNNPACK by wiring up these kernels there. This Integration is still waiting on a couple of dependent PRs in other Repos to land.

Dependent PR Tracking

Notes on the Update

When updating XNNPACK to the branch with the integrated Kleidi Kernels, we have to make some changes to the cmake because of refactoring done in XNNPACK. prod-microkernels and kleidiai are both static libraries linked to libXNNPACK.a, since llama runner (which links against xnnpack_backend) is in a seperate project, we need to install these new static libraries so that we can later properly link them to llama runner. These changes can be seen in the corresponding cmake files. The new feature is currently guarded behind EXECUTORCH_XNNPACK_ENABLE_KLEIDI flag.

Repro

git submodule sync
git submodule update --init

I used the following alias's to make it easier to build llama_main for android:

alias build_et_android="cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \
    -DXNNPACK_ENABLE_ARM_BF16=OFF \
    -Bcmake-out-android . && cmake --build cmake-out-android -j16 --target install --config Release
"
alias build_llama_android="cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_USE_TIKTOKEN=ON \
    -Bcmake-out-android/examples/models/llama2 \
    examples/models/llama2 && cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
"

I run the following:

build_et_android
build_llama_android
cd cmake-out-android/examples/models/llama2
adb push llama_main /data/local/tmp/
adb push <path/to/llama3.pte> /data/local/tmp
adb push <path/to/tiktokenizer> /data/local/tmp
adb shell "cd /data/local/tmp && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --cpu_threads=4

Benchmarks

I ran llama3.1 with

sdpa_w_kvcache
quantized embeddings
4bit blockwise quantized weights
dynamic shapes
parallel prefill

on Samsung S22 w/4 threads

Baseline (QD8)

I 00:00:32.772974 executorch:stats.h:84]        Prompt Tokens: 8    Generated Tokens: 119
I 00:00:32.772980 executorch:stats.h:90]        Model Load Time:                15.273000 (seconds)
I 00:00:32.773014 executorch:stats.h:100]       Total inference time:           17.488000 (seconds)              Rate:  6.804666 (tokens/second)
I 00:00:32.773019 executorch:stats.h:108]               Prompt evaluation:      2.971000 (seconds)               Rate:  2.692696 (tokens/second)
I 00:00:32.773023 executorch:stats.h:119]               Generated 119 tokens:   14.517000 (seconds)              Rate:  8.197286 (tokens/second)
I 00:00:32.773027 executorch:stats.h:127]       Time to first generated token:  2.971000 (seconds)
I 00:00:32.773030 executorch:stats.h:134]       Sampling time over 127 tokens:  0.173000 (seconds)

QP8

I 00:00:46.767429 executorch:stats.h:84]        Prompt Tokens: 8    Generated Tokens: 119
I 00:00:46.767437 executorch:stats.h:90]        Model Load Time:                28.297000 (seconds)
I 00:00:46.767475 executorch:stats.h:100]       Total inference time:           18.436000 (seconds)              Rate:  6.454762 (tokens/second)
I 00:00:46.767483 executorch:stats.h:108]               Prompt evaluation:      1.770000 (seconds)               Rate:  4.519774 (tokens/second)
I 00:00:46.767491 executorch:stats.h:119]               Generated 119 tokens:   16.666000 (seconds)              Rate:  7.140286 (tokens/second)
I 00:00:46.767522 executorch:stats.h:127]       Time to first generated token:  1.770000 (seconds)
I 00:00:46.767527 executorch:stats.h:134]       Sampling time over 127 tokens:  0.189000 (seconds)

We see ~+68% Perf Improvement on Prefill, ad ~-13% regression on Decode. See the dependent XNNPACK PR for more benchmarking details

pytorch-bot · 2024-09-07T08:18:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5162

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3abbc5e with merge base b60fa71 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mcr229 · 2024-09-10T01:55:20Z

Llama Benchmarks on One Plus 12 with 4 Threads

		prompt_size=93
One Plus 12		Run 1	Run 2	Run 3	Average	(QP8-QD8)/QD8
Model Load Time (seconds)	QD8	13.722	13.772	13.775	13.75633333	1.193486636
	QP8	29.989	30.6	29.934	30.17433333
Total Inference Time (tok/s)	QD8	3.754002	3.773166	3.770655	3.765941	0.1071610876
	QP8	4.157496	4.21627	4.134744	4.169503333
Prompt Evaluation (tok/s)	QD8	15.680324	15.773406	15.768057	15.74059567	0.1980304134
	QP8	18.753781	19.226793	18.592563	18.85771233
Token Generation (tok/s)	QD8	10.87652	10.914928	10.90093	10.89745933	-0.03175853405
	QP8	10.562286	10.536102	10.555728	10.551372

		prompt_size=8
One Plus 12		Run 1	Run 2	Run 3	Average	(QP8-QD8)/QD8
Model Load Time (seconds)	QD8	13.709	13.665	13.757	13.71033333	1.18749848
	QP8	29.987	29.992	29.995	29.99133333
Total Inference Time (tok/s)	QD8	10.653536	10.663082	10.67935	10.66532267	-0.03114314279
	QP8	10.350526	10.334347	10.31464	10.333171
Prompt Evaluation (tok/s)	QD8	15.355086	15.355086	16.632017	15.78072967	0.1046826753
	QP8	17.777778	17.316017	17.204301	17.43269867
Token Generation (tok/s)	QD8	11.174758	11.185262	11.161133	11.17371767	-0.036838172
	QP8	10.772155	10.766308	10.747832	10.76209833

digantdesai · 2024-09-10T02:35:49Z

so IIUC ~20% faster prefill for longer prompts with QP8? seems like a win :)

backends/xnnpack/CMakeLists.txt

digantdesai · 2024-09-12T02:15:57Z

backends/xnnpack/runtime/XNNCompiler.cpp

@@ -630,7 +630,11 @@ Error defineConvertNode(
      subgraph_ptr,
      remapped_ids.at(graph_node->input_id()),
      remapped_ids.at(graph_node->output_id()),
+#ifdef ENABLE_XNNPACK_KLEIDI
+      0x00000080);


what is this magic?

XNNPACK folk didn't make this available through xnnpack.h. They actually do the same with mediapipe LOL:

https://github.com/google-ai-edge/mediapipe/blob/cae031ac4ad34cf45a2f29f02615beb049de0e49/mediapipe/tasks/cc/genai/inference/utils/xnn_utils/graph_builder.cc#L384

digantdesai · 2024-09-30T18:23:36Z

backends/xnnpack/runtime/XNNCompiler.cpp

@@ -630,7 +630,14 @@ Error defineConvertNode(
      subgraph_ptr,
      remapped_ids.at(graph_node->input_id()),
      remapped_ids.at(graph_node->output_id()),
+#ifdef ENABLE_XNNPACK_KLEIDI
+      // This maps to XNNPACK's XNN_FLAG_MAYBE_PACK_FOR_QB4W_GEMM


perhaps we should fix it in XNNPACK, independent of this PR

I don't think they want this public yet, since it would cause backwards compatibility issues in the future. This is likely the path until they have a complete qp8 story.

digantdesai · 2024-09-30T18:24:55Z

backends/xnnpack/third-party/generate-xnnpack-wrappers.py

I assume we will add this back or fix this before landing this internally?

I hope to not have this at all anymore

facebook-github-bot · 2024-09-30T19:04:12Z

@mcr229 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: # Bringing KleidiAI QB4 Kernels to ExecuTorch KleidiAI has released QB4 Kernels which pack the activation while dynamically quantizating to improve performance of the gemm kernel. We leverage these kernels through XNNPACK by wiring up these kernels there. This Integration is still waiting on a couple of dependent PRs in other Repos to land. ## Dependent PR Tracking * google/XNNPACK#7003 * https://gitlab.arm.com/kleidi/kleidiai/-/merge_requests/28 ## Notes on the Update When updating XNNPACK to the branch with the integrated Kleidi Kernels, we have to make some changes to the cmake because of refactoring done in XNNPACK. prod-microkernels and kleidiai are both static libraries linked to libXNNPACK.a, since llama runner (which links against xnnpack_backend) is in a seperate project, we need to install these new static libraries so that we can later properly link them to llama runner. These changes can be seen in the corresponding cmake files. The new feature is currently guarded behind EXECUTORCH_XNNPACK_ENABLE_KLEIDI flag. ## Repro ``` git submodule sync git submodule update --init ``` I used the following alias's to make it easier to build llama_main for android: ``` alias build_et_android="cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-23 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DEXECUTORCH_ENABLE_LOGGING=1 \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \ -DXNNPACK_ENABLE_ARM_BF16=OFF \ -Bcmake-out-android . && cmake --build cmake-out-android -j16 --target install --config Release " alias build_llama_android="cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-23 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DCMAKE_BUILD_TYPE=Release \ -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_USE_TIKTOKEN=ON \ -Bcmake-out-android/examples/models/llama2 \ examples/models/llama2 && cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release " ``` I run the following: ``` build_et_android build_llama_android cd cmake-out-android/examples/models/llama2 adb push llama_main /data/local/tmp/ adb push <path/to/llama3.pte> /data/local/tmp adb push <path/to/tiktokenizer> /data/local/tmp adb shell "cd /data/local/tmp && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --cpu_threads=4 ``` ## Benchmarks I ran llama3.1 with * sdpa_w_kvcache * quantized embeddings * 4bit blockwise quantized weights * dynamic shapes * parallel prefill on Samsung S22 w/4 threads ### Baseline (QD8) ``` I 00:00:32.772974 executorch:stats.h:84] Prompt Tokens: 8 Generated Tokens: 119 I 00:00:32.772980 executorch:stats.h:90] Model Load Time: 15.273000 (seconds) I 00:00:32.773014 executorch:stats.h:100] Total inference time: 17.488000 (seconds) Rate: 6.804666 (tokens/second) I 00:00:32.773019 executorch:stats.h:108] Prompt evaluation: 2.971000 (seconds) Rate: 2.692696 (tokens/second) I 00:00:32.773023 executorch:stats.h:119] Generated 119 tokens: 14.517000 (seconds) Rate: 8.197286 (tokens/second) I 00:00:32.773027 executorch:stats.h:127] Time to first generated token: 2.971000 (seconds) I 00:00:32.773030 executorch:stats.h:134] Sampling time over 127 tokens: 0.173000 (seconds) ``` ### QP8 ``` I 00:00:46.767429 executorch:stats.h:84] Prompt Tokens: 8 Generated Tokens: 119 I 00:00:46.767437 executorch:stats.h:90] Model Load Time: 28.297000 (seconds) I 00:00:46.767475 executorch:stats.h:100] Total inference time: 18.436000 (seconds) Rate: 6.454762 (tokens/second) I 00:00:46.767483 executorch:stats.h:108] Prompt evaluation: 1.770000 (seconds) Rate: 4.519774 (tokens/second) I 00:00:46.767491 executorch:stats.h:119] Generated 119 tokens: 16.666000 (seconds) Rate: 7.140286 (tokens/second) I 00:00:46.767522 executorch:stats.h:127] Time to first generated token: 1.770000 (seconds) I 00:00:46.767527 executorch:stats.h:134] Sampling time over 127 tokens: 0.189000 (seconds) ``` We see ~+68% Perf Improvement on Prefill, ad ~-13% regression on Decode. See the dependent XNNPACK PR for more benchmarking details Pull Request resolved: pytorch#5162 Reviewed By: digantdesai Differential Revision: D63651987 Pulled By: mcr229

facebook-github-bot · 2024-09-30T22:03:47Z

This pull request was exported from Phabricator. Differential Revision: D63651987

facebook-github-bot · 2024-09-30T22:26:02Z

This pull request was exported from Phabricator. Differential Revision: D63651987

Summary: # Bringing KleidiAI QB4 Kernels to ExecuTorch KleidiAI has released QB4 Kernels which pack the activation while dynamically quantizating to improve performance of the gemm kernel. We leverage these kernels through XNNPACK by wiring up these kernels there. This Integration is still waiting on a couple of dependent PRs in other Repos to land. ## Dependent PR Tracking * google/XNNPACK#7003 * https://gitlab.arm.com/kleidi/kleidiai/-/merge_requests/28 ## Notes on the Update When updating XNNPACK to the branch with the integrated Kleidi Kernels, we have to make some changes to the cmake because of refactoring done in XNNPACK. prod-microkernels and kleidiai are both static libraries linked to libXNNPACK.a, since llama runner (which links against xnnpack_backend) is in a seperate project, we need to install these new static libraries so that we can later properly link them to llama runner. These changes can be seen in the corresponding cmake files. The new feature is currently guarded behind EXECUTORCH_XNNPACK_ENABLE_KLEIDI flag. ## Repro ``` git submodule sync git submodule update --init ``` I used the following alias's to make it easier to build llama_main for android: ``` alias build_et_android="cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-23 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DEXECUTORCH_ENABLE_LOGGING=1 \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \ -DXNNPACK_ENABLE_ARM_BF16=OFF \ -Bcmake-out-android . && cmake --build cmake-out-android -j16 --target install --config Release " alias build_llama_android="cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-23 \ -DCMAKE_INSTALL_PREFIX=cmake-out-android \ -DCMAKE_BUILD_TYPE=Release \ -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ -DEXECUTORCH_USE_TIKTOKEN=ON \ -Bcmake-out-android/examples/models/llama2 \ examples/models/llama2 && cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release " ``` I run the following: ``` build_et_android build_llama_android cd cmake-out-android/examples/models/llama2 adb push llama_main /data/local/tmp/ adb push <path/to/llama3.pte> /data/local/tmp adb push <path/to/tiktokenizer> /data/local/tmp adb shell "cd /data/local/tmp && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --cpu_threads=4 ``` ## Benchmarks I ran llama3.1 with * sdpa_w_kvcache * quantized embeddings * 4bit blockwise quantized weights * dynamic shapes * parallel prefill on Samsung S22 w/4 threads ### Baseline (QD8) ``` I 00:00:32.772974 executorch:stats.h:84] Prompt Tokens: 8 Generated Tokens: 119 I 00:00:32.772980 executorch:stats.h:90] Model Load Time: 15.273000 (seconds) I 00:00:32.773014 executorch:stats.h:100] Total inference time: 17.488000 (seconds) Rate: 6.804666 (tokens/second) I 00:00:32.773019 executorch:stats.h:108] Prompt evaluation: 2.971000 (seconds) Rate: 2.692696 (tokens/second) I 00:00:32.773023 executorch:stats.h:119] Generated 119 tokens: 14.517000 (seconds) Rate: 8.197286 (tokens/second) I 00:00:32.773027 executorch:stats.h:127] Time to first generated token: 2.971000 (seconds) I 00:00:32.773030 executorch:stats.h:134] Sampling time over 127 tokens: 0.173000 (seconds) ``` ### QP8 ``` I 00:00:46.767429 executorch:stats.h:84] Prompt Tokens: 8 Generated Tokens: 119 I 00:00:46.767437 executorch:stats.h:90] Model Load Time: 28.297000 (seconds) I 00:00:46.767475 executorch:stats.h:100] Total inference time: 18.436000 (seconds) Rate: 6.454762 (tokens/second) I 00:00:46.767483 executorch:stats.h:108] Prompt evaluation: 1.770000 (seconds) Rate: 4.519774 (tokens/second) I 00:00:46.767491 executorch:stats.h:119] Generated 119 tokens: 16.666000 (seconds) Rate: 7.140286 (tokens/second) I 00:00:46.767522 executorch:stats.h:127] Time to first generated token: 1.770000 (seconds) I 00:00:46.767527 executorch:stats.h:134] Sampling time over 127 tokens: 0.189000 (seconds) ``` We see ~+68% Perf Improvement on Prefill, ad ~-13% regression on Decode. See the dependent XNNPACK PR for more benchmarking details Pull Request resolved: pytorch#5162 Reviewed By: digantdesai Differential Revision: D63651987 Pulled By: mcr229

facebook-github-bot · 2024-09-30T23:43:25Z

This pull request was exported from Phabricator. Differential Revision: D63651987

facebook-github-bot · 2024-10-01T02:30:22Z

@mcr229 merged this pull request in 8079eb7.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 7, 2024

mcr229 requested a review from digantdesai September 7, 2024 08:20

mcr229 force-pushed the kleidi_integration branch 3 times, most recently from dd97512 to 89b1783 Compare September 7, 2024 23:37

digantdesai reviewed Sep 12, 2024

View reviewed changes

mcr229 force-pushed the kleidi_integration branch 3 times, most recently from 35a5ee0 to 4d96001 Compare September 14, 2024 00:28

mcr229 force-pushed the kleidi_integration branch 7 times, most recently from afb5b66 to 9a06ca9 Compare September 30, 2024 18:14

mcr229 requested a review from digantdesai September 30, 2024 18:17

digantdesai reviewed Sep 30, 2024

View reviewed changes

digantdesai approved these changes Sep 30, 2024

View reviewed changes

mcr229 force-pushed the kleidi_integration branch from 9a06ca9 to 05aaa34 Compare September 30, 2024 22:03

facebook-github-bot added the fb-exported label Sep 30, 2024

mcr229 force-pushed the kleidi_integration branch from 05aaa34 to 8302fad Compare September 30, 2024 22:26

mcr229 force-pushed the kleidi_integration branch from 8302fad to 3abbc5e Compare September 30, 2024 23:43

facebook-github-bot closed this in 8079eb7 Oct 1, 2024

facebook-github-bot added the Merged label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kleidi Integration #5162

Kleidi Integration #5162

mcr229 commented Sep 7, 2024

pytorch-bot bot commented Sep 7, 2024 •

edited

Loading

mcr229 commented Sep 10, 2024

digantdesai commented Sep 10, 2024 •

edited

Loading

digantdesai Sep 12, 2024

mcr229 Sep 13, 2024

digantdesai Sep 30, 2024

mcr229 Sep 30, 2024

digantdesai Sep 30, 2024

mcr229 Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Oct 1, 2024

Kleidi Integration #5162

Kleidi Integration #5162

Conversation

mcr229 commented Sep 7, 2024

Bringing KleidiAI QB4 Kernels to ExecuTorch

Dependent PR Tracking

Notes on the Update

Repro

Benchmarks

Baseline (QD8)

QP8

pytorch-bot bot commented Sep 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5162

✅ No Failures

mcr229 commented Sep 10, 2024

digantdesai commented Sep 10, 2024 • edited Loading

digantdesai Sep 12, 2024

Choose a reason for hiding this comment

mcr229 Sep 13, 2024

Choose a reason for hiding this comment

digantdesai Sep 30, 2024

Choose a reason for hiding this comment

mcr229 Sep 30, 2024

Choose a reason for hiding this comment

digantdesai Sep 30, 2024

Choose a reason for hiding this comment

mcr229 Sep 30, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Sep 30, 2024

facebook-github-bot commented Oct 1, 2024

pytorch-bot bot commented Sep 7, 2024 •

edited

Loading

digantdesai commented Sep 10, 2024 •

edited

Loading