Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up orbit computations on integer sets and tuples by providing a better hash function for lists of small integers #5783

Merged
merged 5 commits into from
Aug 31, 2024

Conversation

reiniscirpons
Copy link
Contributor

This PR improves the hash function used for lists.

Text for release notes

see title

Further details

The existing function essentially determined the hash by onlyconsidering the first two entries of the list. This causes significant lots of hash collision and slowdown when computing the orbit on sets, e.g. before the changes in this PR we get:

gap> Orbit(SymmetricGroup(14), [1..7], OnSets);; time;
1341

After changes:

gap> Orbit(SymmetricGroup(14), [1..7], OnSets);; time;
7

@ChrisJefferson
Copy link
Contributor

This seems reasonable, and I agree the old hash function was very bad.

A couple of small suggested changes (just because it took me a while to review this, so it might be useful to other people!) These are very picky comments, but I read this fairly carefully because we are hashing bags.

Usually HashKeyWholeBag is unsafe, but in this case 'boundedtuples' will only ever be lists of integers. Maybe just add a comment saying that, for future readers worried about the HashKeyWholeBag?

Also, I'd make the last argument '1'. I don't think the 'Length(x)' is actually improving the hash value in practice, and I did assume it "meant something", rather than was just a seed.

@reiniscirpons
Copy link
Contributor Author

Thanks for the comments! I've added a comment explaining that we expect the input to be a list of positive ints as well as an assertion to explicitly check this in the code. To avoid issues with different representations (like if the input were a range), we now also convert the input to a Plist if it is not already. Also agree with the comment about the seed value.

@ChrisJefferson
Copy link
Contributor

Thanks, I'm happy with this now, but I'll let other people pop in and have an opinion.

@fingolfin fingolfin changed the title Improve list hash function Improve hash function for lists of small integers Aug 31, 2024
lib/dicthf.gi Outdated Show resolved Hide resolved
@fingolfin fingolfin changed the title Improve hash function for lists of small integers Speed up orbit computations on integer sets and tuples by providing a better hash function for lists of small integers Aug 31, 2024
@fingolfin fingolfin added topic: performance bugs or enhancements related to performance (improvements or regressions) topic: library release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes labels Aug 31, 2024
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've adjusted the title so it can be used as an entry in the release notes. As such I focused it more on the user facing effect.

@fingolfin fingolfin enabled auto-merge (squash) August 31, 2024 11:29
@fingolfin fingolfin added the kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements label Aug 31, 2024
@fingolfin fingolfin merged commit 5c88406 into gap-system:master Aug 31, 2024
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes topic: library topic: performance bugs or enhancements related to performance (improvements or regressions)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants