Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Improve efficiency of BlockValuesRefs #59598

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

Tolker-KU
Copy link
Contributor

@Tolker-KU Tolker-KU commented Aug 24, 2024

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Improve efficiency of BlockValuesRefs by allowing for more efficient C code-gen

@Tolker-KU Tolker-KU requested a review from WillAyd as a code owner August 24, 2024 21:21
@Tolker-KU Tolker-KU marked this pull request as draft August 24, 2024 21:34
@Tolker-KU Tolker-KU marked this pull request as ready for review August 24, 2024 21:40
@rhshadrach
Copy link
Member

@Tolker-KU - can you run and post ASV results

@rhshadrach rhshadrach added Performance Memory or execution speed performance Copy / view semantics Enhancement labels Aug 25, 2024
@Tolker-KU Tolker-KU changed the title PERF: Improve efficiency of BlockValuesRefs PERF: Improve efficiency of BlockValuesRefs Aug 25, 2024
@Tolker-KU
Copy link
Contributor Author

@Tolker-KU - can you run and post ASV results

Thanks for taking a look a this.

I've tried hard to get meaningful runs of the asv benchmarks but I think my computer is so old and slow that noise is dominating the results. I'm getting weird results where some benchmarks are much faster and some are much slower

To show the gains from this I've run below micro-benchmark

>>> N = 10_000
>>> arr = np.random.normal(size=(50, N))
>>> df = pd.DataFrame(arr)
>>> %timeit -n10000 list(df.iterrows())
2.43 ms ± 268 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  <- before
2.1 ms  ± 117 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  <- after

@@ -890,12 +890,12 @@ cdef class BlockValuesRefs:

def __cinit__(self, blk: Block | None = None) -> None:
if blk is not None:
self.referenced_blocks = [weakref.ref(blk)]
self.referenced_blocks = [PyWeakref_NewRef(blk, None)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually change anything? I am under the impression that Cython would be smart enough to handle this the same in both cases.

Instead of looking at the benchmarks (which can be difficult to run and flaky), you can also inspect the Cython annotations before/after the change. Are they actually different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Glad you brought that up. I should've been more explicit in the description, as the motivation for this PR is that the code-gen changes quite dramatically. An example from one of the methods below. Note the code generated for the method does not fit on one screen before the change

Before

image

After

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting. That's...quite the difference

@mroeschke mroeschke added this to the 3.0 milestone Aug 28, 2024
@mroeschke mroeschke merged commit 91541c1 into pandas-dev:main Aug 28, 2024
56 of 59 checks passed
@mroeschke
Copy link
Member

Thanks @Tolker-KU

@Tolker-KU Tolker-KU deleted the perf-internals branch August 28, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Enhancement Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants