-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sources of nondeterminism in egglog #439
base: main
Are you sure you want to change the base?
Conversation
@ezrosent, I ended up having to use indexmap in generic join, which might slow down egglog. Should we just merge it for now? |
Can we instead just use a build-time flag which users can use to opt in for determinism? |
That's a good idea, but perhaps determinism should be opt-out? |
Some thoughts:
|
Reverted the feature, so this should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@oflatt Update to main to get benchmarking! |
3f0507f
to
f10226f
Compare
CodSpeed Performance ReportMerging #439 will not alter performanceComparing Summary
|
I imagine some of the changes on shorter running benchmarks might not be significant - FWIW I just opened a PR to reduce our benchmarks to just the longer more meaningful ones (#444). Otherwise, it seems like there is too much uncertainty introduced by the indeterminacy of the memory allocator. The slowdown in the |
We should try:
|
reference to ahash as fast hash map with ability to set seed https://docs.rs/ahash/latest/ahash/ |
|
f10226f
to
334f911
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still depend on rustc-hash
? We might as well use the same hasher for everything right?
Not sure why, but things are still non-deterministic with this change |
Since I updated the hash function in PR#445 Update hashbrown, I'd recommend you rebase onto main. My goal there was to update to the latest versions of the hashing dependencies. I didn't notice that egglog also uses rustc-hash, so that ought to be updated, since they recently did significant improvements improving both performance and hash strength. It's no longer the Firefox-derived hash function; its collision weaknesses, which were problematic for rustc, have been fixed. Now, rustc-hash is a mildly stripped down version of foldhash, making tradeoffs for usage in a compiler (I've written on the differences between them in my notes). It's probably only worth using one of the two. Max's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Thalia's comment
One thought: Another source of nondeterminism that can happen if you're running more than one instance of egglog in a single process is symbol_table. The numbers symbol_table hands out will not be deterministic (there are various process-global counters being incremented and the like. Different numbers will cause hashmap to store strings or symbols in different spots, hence iteration order will be different too. |
No description provided.