test-backend-ops : use flops for some performance tests #9657

slaren · 2024-09-26T15:54:13Z

Use flops for some performance tests
Parallelize tensor quantization
Use a different set of test cases for performance and correctness tests

- parallelize tensor quantization - use a different set of cases for performance and correctness tests ggml-ci

JohannesGaessler · 2024-09-27T12:38:03Z

tests/test-backend-ops.cpp

        GGML_UNUSED(t);
+        return 2 * m * n * k * bs[0] * nr[0] * bs[1] * nr[1];


Theoretically you would only need $m \cdot n \cdot k$ multiplications and $m \cdot \cdot (k - 1)$ additions. In practice however this is not going to make a difference due to $k >> 1$.

I agree, but most sources seem use $2mnk$, so I thought that it would be more important to be consistent with the way other people measure FLOPS than to correct a small inaccuracy.

JohannesGaessler · 2024-09-27T13:02:56Z

The PR looks good to me based on static code analysis; Right now I'm on a train with an unreliable internet connection, I'll test the code when I get home.

JohannesGaessler · 2024-09-27T19:23:00Z

tests/test-backend-ops.cpp

+static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
+    std::vector<std::unique_ptr<test_case>> test_cases;
+
+    for (int bs : {1, 8, 16, 32, 512}) {


Matrix multiplications become more efficient for larger batch sizes. At the same time the number of runs is scaled in such a way that the total FLOPs are roughly constant. So you could consider removing 16 and instead adding 1024 or 2048. That would make the test faster and cover a wider range.

I have changed it to include only 1 and 512. I think that should provide a good overview of the performance that can be expected during generation and prompt processing for people casually running the benchmark, but I expect that people working on optimizing an operation will modify this function to add the test cases relevant to what they are working on. It would be nice to be able to specify the parameters of the test cases from the command line as well, but that's a more complex change.

The way the number of runs was scaled was not very good, and I noticed that some of the test were too short and produced very inaccurate results. I changed it so that the memory size or flops is used as an initial estimate to determine how many times to duplicate the op in the graph, but the graph is evaluated repeatedly until it has run for least one second. This seem to produce much more reliable results.

* test-backend-ops : use flops for some performance tests - parallelize tensor quantization - use a different set of cases for performance and correctness tests - run each test for at least one second

github-actions bot added the testing Everything test related label Sep 26, 2024

test-backend-ops : use flops for some performance tests

d4c57cd

- parallelize tensor quantization - use a different set of cases for performance and correctness tests ggml-ci

slaren force-pushed the sl/test-backend-ops-perf-flops branch from 2c54964 to d4c57cd Compare September 26, 2024 16:24

JohannesGaessler self-requested a review September 27, 2024 07:27

JohannesGaessler reviewed Sep 27, 2024

View reviewed changes

JohannesGaessler approved these changes Sep 27, 2024

View reviewed changes

run each test for at least one second, simplify perf cases

287f83d

slaren merged commit 1b2f992 into master Sep 28, 2024
54 checks passed

slaren deleted the sl/test-backend-ops-perf-flops branch September 28, 2024 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test-backend-ops : use flops for some performance tests #9657

test-backend-ops : use flops for some performance tests #9657

slaren commented Sep 26, 2024

JohannesGaessler Sep 27, 2024

slaren Sep 28, 2024

JohannesGaessler commented Sep 27, 2024

JohannesGaessler Sep 27, 2024

slaren Sep 28, 2024

		GGML_UNUSED(t);
		return 2 * m * n * k * bs[0] * nr[0] * bs[1] * nr[1];

test-backend-ops : use flops for some performance tests #9657

test-backend-ops : use flops for some performance tests #9657

Conversation

slaren commented Sep 26, 2024

JohannesGaessler Sep 27, 2024

Choose a reason for hiding this comment

slaren Sep 28, 2024

Choose a reason for hiding this comment

JohannesGaessler commented Sep 27, 2024

JohannesGaessler Sep 27, 2024

Choose a reason for hiding this comment

slaren Sep 28, 2024

Choose a reason for hiding this comment