math/rand/v2: a new API for math/rand and a first v2 for std #60751
Replies: 13 comments 86 replies
-
This could be an opportunity to rename the package to |
Beta Was this translation helpful? Give feedback.
This comment has been minimized.
This comment has been minimized.
-
Now that we have generics, could we provide two or four top level-functions (maybe Combined with better type inference, that could also be a significant noise reduction for common cases (such as choosing a random |
Beta Was this translation helpful? Give feedback.
-
We had a code base that was affected by the performance of the prior API. We ended up with a sync.Pool wrapping instances that certainly could be replaced by a v2 implementation. I did a casual study of the performance of the current API, and the exp/ API. https://gist.github.com/raggi/25035bf3fc5cd7c1a3b02dd3987a87fa In our use case what is really desired is regularly doing this:
That is, the cardinality of userGroup is very high, and the frequency with which this method is called is very high. The goal of the code is to project groups of objects into repeatable but otherwise random groups. There are many ways to do this, but a seeded random function is one potentially reasonable way. I'm really curious if it is necessary, and/or what the motivation is to maintain the concept of a "Source" that is separate from the random number generator. This apparently flexibility in the API was unable to solve the problem at hand, and so it is not clear if it would solve for future problems either. The API mandates that the values escape to the heap, which adds a GC pressure cost to these kinds of use cases. Unless std itself provides alternative sources, this may be better handled by users defining an interface, and offering alternative implementations behind that interface. It should be possible to offer re-usable range-draw helpers such as I would love to see an API with PCG that gives me convenient access (offers a variety of drawn types as the Rand API does) while also being able to be constructed, seeded, and discarded entirely on the stack. |
Beta Was this translation helpful? Give feedback.
-
I'd argue in favor of using |
Beta Was this translation helpful? Give feedback.
-
I've studied random number generators and their applications as a hobby for quite a few years now, including making several packages in various languages related to the topic, not to mention being a bother plenty often in the previous math/rand proposals. I'm glad to see some movement here. I think the thing I miss most from the suggested API is implementing encoding.BinaryMarshaler and its dual. This is generally straightforward and cheap for PRNGs and lends a lot of versatility, since it makes reproducing states trivial. Plus, x/exp/rand already implements it for PCG. Another thing I would like to see is a second standard library generator optimized for quality rather than throughput. I've suggested this on #26263 and been turned down, but I don't think I clearly communicated at the time just how difficult it is to diagnose the problems that arise due to insufficient quality. The only people who will even think about it are the experts who will probably choose a different package in the first place. Mersenne Twister is the popular choice in this space, but others exist. (I don't think this is especially likely to happen, still.)
|
Beta Was this translation helpful? Give feedback.
-
FWIW my one use case for this is that I tend to use I fully agree that removing the top-level |
Beta Was this translation helpful? Give feedback.
-
I think it would be convenient to have access to a goroutine safe Source other than the top level functions. One way to do that would be to have a function that takes a Source and returns a safe Source that's wrapped in a mutex. I also don't think Rand really pays for itself. Once you have all Source64s, there's not as much need to stash values in Rand. So, putting it together, I would like to see something like this: package rand
type Source {
Uint64() uint64
}
func NewPCG(seed1, seed2 uint64) *PCG
type PCG struct { ... }
func (p *PCG) Uint64() uint64
func (p *PCG) Seed(seed1, seed2 uint64)
func SafeSource(Source) Source
func Default() Source { return default }
func init() { default = SafeSource(NewPCG(someseed(), someseed2()) }
func Float32(Source) float32
func Float64(Source) float64
func ExpFloat64(Source) float64
func NormFloat64(Source) float64
func Intn[Integer ...](Source, Integer) Integer
func Choice[T any](Source, []T) T
func Perm(Source, int) []int
func Shuffle[T any](Source, []T)
func ShuffleFunc(Source, int, func(int, int)) That's pretty tight and easy to understand. |
Beta Was this translation helpful? Give feedback.
-
I've written an alternative
|
Beta Was this translation helpful? Give feedback.
-
It is recommended to show the practice of this situation in other languages for comparison. |
Beta Was this translation helpful? Give feedback.
-
For me, for completeness, there needs to be a way to generate these omitted values when simple division algorithm doesn't generate |
Beta Was this translation helpful? Give feedback.
-
Just for completeness as I think there's a lot of great input. My normal use case for seeding and using math/rand is testing functions that are probability based. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the discussion everyone. I have filed a proposal: #61716. |
Beta Was this translation helpful? Give feedback.
-
Based on earlier discussions in #26263 and #21835 as well as discussions with @robpike, I suggest adding a new version of math/rand, imported as math/rand/v2, to the standard library. This GitHub Discussion is meant to gather feedback before moving to a proposal.
The math/rand/v2 API would use math/rand as a starting point and then make the following backwards incompatible changes:
Remove Rand.Read and top-level Read. It is almost always a mistake to pretend that a pseudo-random generator is a good source of arbitrarily long byte sequences. The kinds of simulations and non-deterministic algorithms for which math/rand is a good choice almost never need byte sequences. Read is the only common piece of API between math/rand and crypto/rand, and code should essentially always use crypto/rand.Read instead. (math/rand.Read and crypto/rand.Read are problematic because they have the the same signature; math/rand.Int and crypto/rand.Int also both exist, but with different signatures, meaning code never accidentally mistakes one for the other.)
Remove Source.Seed, Rand.Seed, and top-level Seed. Top-level Seed is deprecated as of Go 1.20. Source.Seed and Rand.Seed assume that the underlying source can be seeded by a single int64, which is only true of a limited number of sources. Specific source implementations can provide Seed methods with appropriate signatures, or none at all for generators that cannot be reseeded; the details of seeding do not belong in the general interface.
Note that removing top-level Seed means that the top-level functions like Int will always be randomly seeded rather than deterministically seeded. math/rand/v2 will not pay attention to the
randautoseed
GODEBUG setting that math/rand does; auto-seeding for top-level functions is the only mode. This means in turn that the specific PRNG algorithm used by the top-level functions is unspecified and can change from release to release without breaking any existing code.Change the Source interface to have a single
Uint64() uint64
method, replacingInt63() int64
. The latter was overfitting to the original Mitchell & Reeds LFSR generator. Modern generators can provide uint64s, and uint64s have fewer special cases at call sites.Remove Source64, which is unnecessary now that Source provides the
Uint64
method.Use a more straightforward implementation in Float32 and Float64. Taking Float64 as an example, it originally used
float64(r.Int63()) / (1<<63)
, but this has the problem of occasionally rounding up to 1.0, which Float64 must not. We tried changing it tofloat64(r.Int63n(1<<53) / (1<<53)
, which avoids the rounding problem, but we decided it was a backwards compatibile change to break the value streams that rand.Rand generators, and instead added a retry loop for the rare 1.0 case. Now we can make the breaking change, which is simpler and faster.Note that some people have observed that the simple division does not make use of all the possible float64 values in the range [0, 1). For example values like 1/(1<<54), 1/(1<<55), 3/(1<<55), and so on are not generated at all, both in today's math/rand and in this simpler algorithm. Only the values 0 and 1/(1<<53) are, not the ones in between. It is possible to introduce even more complex algorithms to spread out the low values more while preserving something that can be deemed a uniform distribution, but these algorithms seem like overkill. The simple division should continue to suffice.
Implement Rand.Perm in terms of Rand.Shuffle. Shuffle is a bit more efficient, and then we have only one implementation. It has been suggested elsewhere to remove Rand.Perm instead, but keeping it costs little (especially if implemented in terms of Shuffle) and avoids unnecessary churn for users.
Rename Int31, Int31n, Int63, Int64n to Int32, Int32n, Int64, Int64n. The names are unnecessarily pedantic and confusing.
Add Uint32, Uint32n, Uint64, Uint64n, Uint, Uintn, both as top-level functions and methods on Rand. At the least, we need Uint64 to provide access to the widest values that the Source can provide. But we may as well complete the set while we are here.
Use Lemire's algorithm in Intn, Uintn, Int32n, Uint32n, Int64n, Uint64n. Preliminary benchmarks show a 40% savings compared to v1 Int31n and a 75% savings compared to v1 Int63n. (Like with Float64, since this changes the values returned, it is a breaking change that can only be applied in math/rand/v2.)
Add a new Source implementation, PCG-DXSM, with this API:
PCG is a simple, efficient algorithm with good statistical randomness properties. The DXSM variant was introduced by the author specifically to correct a rare, obscure shortcoming in the original (PCG-XSLRR) and is now the default generator in Numpy.
PCG is provided for people who are writing simulations and need a seeded, deterministic source of randomness suitable for most purposes. Note that PCG is not assumed in any code. In particular the top-level functions need not use PCG (and in the current prototype do not). If in the future we add another algorithm, it will sit alongside PCG as a true peer.
Remove the Mitchell & Reeds LFSR generator and
NewSource
. An alternative to removing it would be to give it a name likeNewLFSR
orNewMitchellReeds
, but that would serve no purpose. The obvious purpose would be to provide the original math/rand source so that code that needs to reproduce the exact math/rand outputs can do so. But the other changes to Rand, in routines like Intn and Float64, change the derived values from Rand's methods for a given Source input stream. Since preserving the math/rand Source stream would not suffice to preserve the math/rand Rand stream, preserving the math/rand Source is not worthwhile. Programs that need exactly the math/rand value streams can continue to use that package; it will not be removed.The most direct motivation for this proposal is to clean up math/rand and fix these many lingering problems, especially the use of an outdated generator, slow algorithms, and the unfortunate collision with crypto/rand.Read.
A more indirect but equally important motivation is to set an example for other v2 packages in the standard library. Creating math/rand/v2 will let us work out tooling issues (support for v2 packages in gopls, goimports, and so on) in a relatively rarely used package with fairly low stakes attached before moving on to more commonly used, higher-stakes packages like sync/v2 or encoding/json/v2.
Once math/rand/v2 has shipped, we would tag and delete x/exp/rand. This would keep programs that already use x/exp/rand working (by reference to the tagged version or older versions of x/exp) but allow us to delete the code from the main branch of x/exp, making clear that development on it has ended.
A prototype of these math/rand/v2 changes can be found at https://go.dev/cl/502506 and the CLs below it in the stack.
Beta Was this translation helpful? Give feedback.
All reactions