Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Python example #444

Merged
merged 11 commits into from
Oct 24, 2024

Conversation

saulshanabrook
Copy link
Member

@saulshanabrook saulshanabrook commented Oct 17, 2024

In this PR we change the benchmarks to only run long-running examples instead of all of them. This is to reduce variance caused by indeterminacy in the memory allocator, which is overly prominent in short benchmarks (oxc-project/backlog#89).

It also updates the Python example to more accurately reflect the current workload, including optimizing, and turning into a final Python string, all with CSE (so the size of the file is smaller than before). It should output a final string with something like this:

def __fn(X, y):
    assert X.dtype == np.dtype(np.float64)
    assert X.shape == (150, 4,)
    assert np.all(np.isfinite(X))
    assert y.dtype == np.dtype(np.int64)
    assert y.shape == (150,)
    assert set(np.unique(y)) == set((0, 1, 2,))
    _0 = y == np.array(0)
    _1 = np.sum(_0)
    _2 = y == np.array(1)
    _3 = np.sum(_2)
    _4 = y == np.array(2)
    _5 = np.sum(_4)
    _6 = np.array((_1, _3, _5,)).astype(np.dtype(np.float64))
    _7 = _6 / np.array(float(150))
    _8 = np.zeros((3, 4,), dtype=np.dtype(np.float64))
    _9 = np.sum(X[_0], axis=0)
    _10 = _9 / np.array(X[_0].shape[0])
    _8[0, :,] = _10
    _11 = np.sum(X[_2], axis=0)
    _12 = _11 / np.array(X[_2].shape[0])
    _8[1, :,] = _12
    _13 = np.sum(X[_4], axis=0)
    _14 = _13 / np.array(X[_4].shape[0])
    _8[2, :,] = _14
    _15 = _7 @ _8
    _16 = X - _15
    _17 = np.sqrt(np.array(float(1 / 147)))
    _18 = X[_0] - _8[0, :,]
    _19 = X[_2] - _8[1, :,]
    _20 = X[_4] - _8[2, :,]
    _21 = np.concatenate((_18, _19, _20,), axis=0)
    _22 = np.sum(_21, axis=0)
    _23 = _22 / np.array(_21.shape[0])
    _24 = np.expand_dims(_23, 0)
    _25 = _21 - _24
    _26 = np.square(_25)
    _27 = np.sum(_26, axis=0)
    _28 = _27 / np.array(_26.shape[0])
    _29 = np.sqrt(_28)
    _30 = _29 == np.array(0)
    _29[_30] = np.array(float(1))
    _31 = _21 / _29
    _32 = _17 * _31
    _33 = np.linalg.svd(_32, full_matrices=False)
    _34 = _33[1] > np.array(0.0001)
    _35 = _34.astype(np.dtype(np.int32))
    _36 = np.sum(_35)
    _37 = _33[2][:_36, :,] / _29
    _38 = _37.T / _33[1][:_36]
    _39 = np.array(150) * _7
    _40 = _39 * np.array(float(1 / 2))
    _41 = np.sqrt(_40)
    _42 = _8 - _15
    _43 = _41 * _42.T
    _44 = _43.T @ _38
    _45 = np.linalg.svd(_44, full_matrices=False)
    _46 = np.array(0.0001) * _45[1][0]
    _47 = _45[1] > _46
    _48 = _47.astype(np.dtype(np.int32))
    _49 = np.sum(_48)
    _50 = _38 @ _45[2].T[:, :_49,]
    _51 = _16 @ _50
    return _51[:, :2,]

@saulshanabrook saulshanabrook requested a review from a team as a code owner October 17, 2024 20:36
@saulshanabrook saulshanabrook requested review from mwillsey and removed request for a team October 17, 2024 20:36
@saulshanabrook saulshanabrook requested review from Alex-Fischman and removed request for mwillsey October 17, 2024 20:47
Copy link

codspeed-hq bot commented Oct 17, 2024

CodSpeed Performance Report

Merging #444 will not alter performance

Comparing saulshanabrook:update-python-example (0082005) with main (b0db068)

Summary

✅ 6 untouched benchmarks

🆕 2 new benchmarks

Benchmarks breakdown

Benchmark main saulshanabrook:update-python-example Change
🆕 python_array_optimize N/A 6.6 s N/A
🆕 stresstest_large_expr N/A 2.9 s N/A

@saulshanabrook saulshanabrook changed the title Update Python example Only benchmark long running examples Oct 17, 2024
@Alex-Fischman
Copy link
Contributor

This seems wrong to me. We want to be able to catch regressions in our whole codebase, not just in 6 slow tests, right?

@saulshanabrook
Copy link
Member Author

I think the issue is that the shorter tests have too much variability to be deterministic enough to benchmark. Alternatively we could try a deterministic memory allocator for our benchmarks. There's a thread I link to above which outlines the issue.

My sense is that if a change doesn't effect our larger benchmarks it's probably fine to pull in... Like as long as they are representative enough of all the code.

@Alex-Fischman
Copy link
Contributor

If the large tests are representative then this is a good change; I just didn't think that they were.

@Alex-Fischman
Copy link
Contributor

Looking at the codspeed regressions on a different PR (#448) I'm seeing that a lot of the variance comes from the parser.

@yihozhang
Copy link
Collaborator

Aren't most of the regressions from short-running programs uninteresting because many are one-off costs like parsing, compilation, and bookkeeping? We care more about performance-critical components like query/apply/rebuild, which only become prominent in longer-running programs.

@yihozhang
Copy link
Collaborator

I think the old python example is a good stress test for the type checker. Maybe we can still keep it but under a different name?

@Alex-Fischman
Copy link
Contributor

Alex-Fischman commented Oct 21, 2024

I think that we should increase the regression threshold back to 10%: look at this report. Even on a benchmark that takes >10ms, the rebuild method still gets significantly slower. (Or we should fix rebuild to not be randomly slow? Unclear how possible this is)

@saulshanabrook
Copy link
Member Author

Another option is I could "ignore" all benchmarks that aren't long running: https://docs.codspeed.io/features/ignoring-benchmarks/

That way they still get recorded and can be viewed manually but won't influence the reports.

The only downside here is that if we add additional short examples, they will be have to manually added to the ignore list in the UI.

@Alex-Fischman
Copy link
Contributor

Alex-Fischman commented Oct 22, 2024

That last option seems strictly better than the solution in this PR to me: opt-out vs. opt-in. Hopefully it would motivate us to create better benchmarks in the future.

@saulshanabrook
Copy link
Member Author

That last option seems strictly better than the solution in this PR to me: opt-out vs. opt-in. Hopefully it would motivate us to create better benchmarks in the future.

OK sounds good! I will update this PR to just add the Python benchmark (and keep the old one) and then just ignore the smaller benchmarks for now :)

@saulshanabrook saulshanabrook changed the title Only benchmark long running examples Update Python example Oct 22, 2024
Copy link
Contributor

@Alex-Fischman Alex-Fischman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have quite a few nits but no need for re-review

tests/python_array_optimize_big.egg Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@saulshanabrook
Copy link
Member Author

OK I think this is ready to merge

@saulshanabrook saulshanabrook merged commit ae69e65 into egraphs-good:main Oct 24, 2024
5 checks passed
@saulshanabrook saulshanabrook deleted the update-python-example branch October 24, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants