1BRC in Go #67
Replies: 11 comments 43 replies
-
Uses memory-maped file and a goroutine per cpu. The dirty trick is to use 8-byte hash of id as a key assuming there are no collisions 🙈 |
Beta Was this translation helpful? Give feedback.
-
Thanks for starting this. I have some experience writing this type of program in Go so I couldn't help myself: https://gist.github.com/corlinp/176a97c58099bca36bcd5679e68f9708 On my M3 Pro Macbook, my solution is quite a bit faster than the Go solutions from warpstreamlabs or AlexanderYastrebov, and twice as fast as the current fastest Java implementation. Please give it a spin! Here's a snippet of the those results:
While memory-mapped files are great for random concurrent reads, I took the serial approach, but with minimal logic in the read so we're not blocking IO to parse individual bytes. Instead, chunks get passed off to another worker to align the lines and another set of concurrent workers to do the parsing. It's still CPU-bound on my Mac, so there are probably some further optimizations to be made, especially in using something faster than Go's built-in maps. |
Beta Was this translation helpful? Give feedback.
-
Here is my final solution: https://github.com/jkroepke/1brc-go/tree/main/go While I initially start with mmap.Open and bufio.Scanner, I was in an area approx 25sec. Then I take note of @AlexanderYastrebov by using The key difference between @AlexanderYastrebov and my solution is may using more fixed-sized array rather than maps over all. It has the downside that number of worker needs to be pre-defined. Both solutions are living between 5.6 and 6.2 seconds on a MacBook Pro 2019 (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz) with 12 logical cores and 32GB RAM.
|
Beta Was this translation helpful? Give feedback.
-
Please try this https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap/ |
Beta Was this translation helpful? Give feedback.
-
Created proof-of-concept of non-java submission #298 |
Beta Was this translation helpful? Give feedback.
-
My Go solution here https://github.com/elh/1brc-go. Opened as a submission via Docker if they end up allowing that #435. I had a lot of fun trying to minimize memory allocations to optimize the file scanning and then just added concurrency at the end. Did not spend a lot of time tuning it. Similar performance characteristics to @corlinp 's solution on my Macbook M2 Pro. Both of our solutions run fast on my machine; others are slow. It's a good performance lesson that everyone's solution seems to be the fastest on their machine 😅
|
Beta Was this translation helpful? Give feedback.
-
Just found out about this a couple days ago and was able to work on it a bit today. I'm running this on an M1 Pro Macbook (10 vCpus). And I ran 2 versions of the code one using the standard go maps, and one using https://github.com/dolthub/swiss . I tested a couple different map implementations and this is the one that performed the best.
General approach I took:
Other than |
Beta Was this translation helpful? Give feedback.
-
Wanted to drop my solution here as well https://github.com/weirdgiraffe/1brc/blob/master/main.go Based on this thread, only the solution from @elh (#67 (comment)) beats mine so far (I just tested locally on my mashine by just running Will try to polish up the solution to benchmark against others. |
Beta Was this translation helpful? Give feedback.
-
Just curious if any of the Go implementations used SIMD in their solutions? I noticed pretty much everyone getting faster times had to do it by busting out some SIMD, but I've looked at several solutions and haven't seen SIMD used in any of them. Would be interesting to see if it made a difference in Go. |
Beta Was this translation helpful? Give feedback.
-
Wow, this is a great challenge and very interesting solutions, I've learned a lot, thank you so much, guys! 🙏 I wonder, what storage would you choose for >1 Billion records with fast enough lookups? Especially for this type of data that changes so often while un-demanding in terms of being heavy relational, pretty much no JOINs needed, right? Would Postgres or SQLite be good enough to insert it quickly with fast enough Selects? Or should we consider things like ClickHouse or on-disk-Redis or maybe ObjectBox for that? Please share your thoughts? |
Beta Was this translation helpful? Give feedback.
-
I've been using this challenge as a way to learn go over the past few days. With my limited go experience, I've seem to hit a wall. I have my submission here. On my M1 macbook pro with 16GB of memory, it runs in ~39s. I can get it a few seconds faster if I play around with some parameters ( I came across another submission from this blog post . I've copied over the submission to my repo ( I've been trying to use the profile and execution traces to figure out why. Below are my flameGraphs and execution traces Below are the The first thing that stands out to my are the # of GC collections. My submission seems to be running GC much more frequently - 100,000 incremental sweeps, 32,000 background sweeps, and around 1000 stop the world GCs. The heap only grows to 180MB at its peak, and typically stays around 72MB. In the otherSubmission, incremental GC only runs 6 times, and stop the work is ~30 times. Their heap grows much larger (up to 3.5GB), and GC happens less frequently. Also, looking at the flame graphs, it looks like there is much more time being spent on managing the stack I believe the time discrepancy is mostly due to GC overhead and go scheduler overhead. Although, I'm not sure why. Could anyone help a newbie figure out where I'm going wrong? Also could share any tools or tricks that could help me figure this out? |
Beta Was this translation helpful? Give feedback.
-
Completes in ~20s on my machine, see https://github.com/AlexanderYastrebov/1brc/tree/go-implementation/src/main/go
Note that I use encrypted disk.
The results vary greatly on different platforms, see discussion below.
Beta Was this translation helpful? Give feedback.
All reactions