discussion: standard iterator interface #54245
Replies: 95 comments 539 replies
-
Why is the special method for use in |
Beta Was this translation helpful? Give feedback.
-
Would it be possible to have an auxilliary method,
If the user wants to use an iterator method in place of the 'natural'
Should this be |
Beta Was this translation helpful? Give feedback.
-
I'm excited by the coroutine optimizations. I've lost track of how many times a goroutine-generator pattern would have been the cleanest way to implement something, but couldn't be used because of the inefficiencies. I support this whole proposal, which I think is necessary and valuable — but the coroutine optimizations would be hugely valuable even if the rest of this proposal didn't happen. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
At the risk of inviting bike-shedding, let me raise a minor concern on a naming choice.
|
Beta Was this translation helpful? Give feedback.
-
I use following pattern to solve this problem in my project, I think it is simpler than interface from proposal , and c++ implementation.
return true in cb means continue give me next one in the loop, return false in cb means break this loop. Maybe we can make this pattern easier to call with golang |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
You should be able to query if an iterator is empty without pulling an element out of it. The Java model is better for that reason. Example. The iterator interface needs another method. |
Beta Was this translation helpful? Give feedback.
-
With respect to integration with the Taking an example from the language's builtins, consider slices: if I write If I wanted to implement my own container type that provided slice-like iteration, I'd need to make the |
Beta Was this translation helpful? Give feedback.
-
I don't like the complexity of adding generators. Their usefulness is suspect - as almost always if a generator makes sense it is because the size is unbounded which usually equates to a more complex process underlying the collection - the block/notify is almost always going to be needed at some level anyway in this case. It may make sense for trivial series generation - but for those cases you can just as easily code it with a state machine. I also don't like the xxxx2 interfaces. I would rather see KeyValue and IndexValue interfaces created to be used as the return types - so Iter[KeyValue[K,V]] or Iter[IndexValue[I,V]] or Iter[Any[E]]. The compiler can use similar inspection code for the range operator. Other than that it is a great step forward for Go. |
Beta Was this translation helpful? Give feedback.
-
One thing that strikes me as unfortunate about this is that there is no way for a function to take advantage of the same mechanism to work on either a full collection or a bare iterator. So, say I have
which would handle This also brings up another thing: The It might be useful to do a type-assertion, as it would be possible for a wrapping iterator to implement |
Beta Was this translation helpful? Give feedback.
-
I like the general direction of this proposal. There are some minor disagreements though:
Another important point is to keep the number of "must-know" interfaces small. So try not to add too many of those. |
Beta Was this translation helpful? Give feedback.
-
I would like us to encourage, from the start, to return exported, concrete types as iterators, not the In particular, I would like all the top-level functions of the A possible exception to this would be functions which exist to compose iterators, like [1] I'm not sure if we've standardized on these terms, but I'm referring to the solution the FGG paper suggests for the expression problem |
Beta Was this translation helpful? Give feedback.
-
I would like to suggest adding two functions for composability and potentially reducing redundancy in APIs:
(bikeshed colors up for discussion. These names are taken from Haskell's Far less useful, but suggesting it just for completeness:
|
Beta Was this translation helpful? Give feedback.
-
Clarification:
Does this mean it only permits the two-variable form, or does it mean to permit either the one or the two-variable form (as with |
Beta Was this translation helpful? Give feedback.
-
How is this going to get implemented? If it uses |
Beta Was this translation helpful? Give feedback.
-
It occurs to me that we might want to specify if On the other hand, this is kind of similar in nature to the infamous "closure over loop variable" problem, which is similar in nature, in that every loop-iteration re-uses the same variable/storage. So, if we do specify that, we might open ourselves up to similar bugs? It might also be worth considering how this interplays with optimizations inlining |
Beta Was this translation helpful? Give feedback.
-
Some thoughts:
|
Beta Was this translation helpful? Give feedback.
-
I love the suggestion to have an Iterator interface in the stdlib! I work heavily with iterator patterns because it allows me to apply information hiding in my business use case. I would like to share my experience working with iterators in Go. The value of supporting error use-cases in the iterator interfaceIn other languages, the idiom to raise/throw an exception solves the integration of Error use-cases. I use iterators primarily to decouple the implementation details of the data provider from the place they consume it. Most iterators in the stdlib already lean towards an OOP direction. type Iter[E any] interface {
Next() (elem E, more bool)
// Err return the error cause.
// if an error occurs during Next, the "more" value will be false, and the error will be accessible from Err.
Err() error
} With a wrapper iterator, it's possible to do lambda expressions The importance of closing an iteratorWhen we use an iterator to hide the origin of the data provider, If the iterator embeds the Using this convention always to close an iterator ensures type Iter[E any] interface {
Next() (elem E, more bool)
// Closer is required to make it able to cancel iterators where resources are being used behind the scene
// for all other cases where the underlying resource is handled on a higher level, it should simply return nil
io.Closer
} Example of a currently used implementationI share the iterator pattern I utilise and my use-cases where I apply them. my current Iterator interfacetype Iterator[V any] interface {
// Closer is required to make it able to cancel iterators where resources are being used behind the scene
// for all other cases where the underlying resource is handled on a higher level, it should simply return nil
io.Closer
// Err return the error cause.
// if an error occurs during Next, the "more" value will be false, and the error will be accessible from Err.
Err() error
// Next will ensure that Value returns the next item when executed.
// If the next value is not retrievable, Next should return false and ensure Err() will return the error cause.
Next() bool
// Value returns the current value in the iterator.
// The action should be repeatable without side effects.
Value() V
} LimitationsAt the moment, usage:// get an iterator from somewhere
iter, err := storage.Users.FindAll(ctx)
if err != nil {
return err
}
defer iter.Close()
// consume iterator
for iter.Next() {
// use value to something
_ = iter.Value()
}
return iter.Err() My common use-cases for iterating:
Example package I use for iterating: |
Beta Was this translation helpful? Give feedback.
-
Perhaps we should think about and design for all iteration concerns now, even if we don't roll them out all at once. What works well for the simplest case (Next) might not work so well for more complicated iteration needs. Java mapped out this space pretty well, in my opinion:
Perhaps batching is also something worth considering (see Reader). We can always roll out the types and methods in x/exp without a compatibility guarantee, and save the for/range changes for when the entire design has solidified. |
Beta Was this translation helpful? Give feedback.
-
Reading through the proposal and the comments I have a few thoughts.
I feel like an implicit expectation I have of Go is that all "special" declarations are limited to the
There might be a few exceptions, notably special pointer-convertible types in the Creating a new language construct, consolidating
If we have to use the old behavior of |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
I like that this proposal is being discussed, but I do want to flag a few issues with this direction that make it seems suboptimal to me. As a few people have noted, handling errors correctly complicates thing; and the main other feature in Go that makes this worse are panics. Any time we're asking people to use Two features of this proposal make that hard:
There is an iterator proposal at #47707 which solves both of these problems by leaning into an API where the author of the iterator controls the iteration instead of the consumer. (This proposal adds
This immediately fixes the problems with Stop ( It also has one other ergonomic benefit for authors of collections that can be iterated over: the implementation of On a different tack, I don't think we should push functions like To be more concrete, the problem I saw doing a lot of ruby programming is that Instead of:
You end up with:
There are definitely cases where this kind of thing is useful, but it can be trickier to comprehend the control flow (it works backwards down the call chain when you stack these things) and it's a lot more overhead (an extra function call per iteration - on top of any function calls required by the iteration proposal itself - and a new object allocation per clause). I would probably keep the |
Beta Was this translation helpful? Give feedback.
-
C# had concurrency from the beginning, it also had iterators (enumerators) and a standard “collections” library for the beginning. Well, .Net 1.1 anyway.
… On Sep 21, 2022, at 7:18 PM, AndrewHarrisSPU ***@***.***> wrote:
So if we don't want to exhaust the iterator, and error handling and closing are not part of the iterator interface, then you are forced to return IterCloser, and IterErrorer values along the iterator in the function signatures.
Issues like this have been a stumbling block for many other languages' implementations of iterators, and the evolution of Python and C# to me seem like a useful contrast. Neither did concurrency before iterators, while in Go, we really can't take concurrency out of the language. IIRC both Python and C# have ended up with something like async iterators, with some subtly distinguished behaviors when employing concepts like using, with, try, catch etc. Additionally, both struggled to get to that point, revising decisions along the way, because it meant implementing more language-level stuff around concurrency, which was already complicated retrofitting. But eventually errors in iterators or errors in resources that provide iterators are inevitable, and good ways of managing these errors are needed.
With this in mind, my opinion is something like:
The basic notion of using and try/catch are reasonable. If it looks like exception handling, good, fine! Go allows us to implement our own exception handling constructs if we really want.
In library code, approximations of using and try/catch - just for iterators - could make sense. We don't have to invent new fundamental language here because Go has always had concurrency in mind. Some of this I would argue should be in an iter library, but I'm not sure if any/all of it has to be.
Also, because Go has always had concurrency in mind, Go devs won't have as much of a problem wrapping our heads around the edgier cases of concurrent iterator. We would, when using iterators, have to think about more elaborate layers of error handling than we often might.
Loosely:
type Resource[T any] struct {
results Iter2[T, error]
}
type Results[T any] struct {
iter Iter2[T, error]
}
func Using[T any]( src Resource[T any], catch func(err error)) Results[T] { ... }
func Try[T any](results Results[T], catch func(err error)) Iter[T] { ... }
To obtain an infallible iterator from Resource[T]:
with the Using function, unwrap Resource[T] -> Results[T] by providing a callback for how to handle an error associated with a resource (e.g., the DB connection gets broken)
with the Try function, unwrap Results[T] -> Iter[T] by providing a callback for how to handle an error associated with a result (e.g., some element was not found or was invalid)
with enough nuance, the catch functions can serialize with iteration, while being concurrent with respect to the resources they are monitoring. In other words, the unwrapped iterator can't continue iterating until a catch function completes execution; that execution can do arbitrary things, like cause any further iteration to Stop().
It's elaborate machinery to implement and use, but I think is flexible enough to be ignored when not needed, and clear when it is needed.
—
Reply to this email directly, view it on GitHub <#54245 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABF2U4IMCQ6ZOM7PNX7ICZ3V7OQUXANCNFSM55QVDQJA>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
-
I think we should adopt the following iterator interface first: // Iter supports iterating over a sequence of values of type `E`.
type Iter[E any] interface {
// Next returns the next value in the iteration if there is one,
// and reports whether the returned value is valid.
// Once Next returns ok==false, the iteration is over,
// and all subsequent calls will return ok==false.
Next() (elem E, ok bool)
} The As for |
Beta Was this translation helpful? Give feedback.
-
Saying you don’t need to pass around containers is like saying you don’t need to pass around slices - only iterators on slices. That doesn’t make sense to me.
Sorry - too many messages in github so it will no longer move to the proper one to comment… ugh.
… On Sep 27, 2022, at 4:55 PM, Axel Wagner ***@***.***> wrote:
Those would work with a normal iter. It's not trivial, but you can effectively create N iterators, each blocking in their Next call until all N of them have been called and then returning the value from the Next call of the one wrapped iterator. If that makes sense. So, effectively, they walk in lockstep. You can then call the functions in separate goroutines. It's tricky enough that I can't just write it down just now, but it should be possible.
You can provide a helper to create a ResetIter from any container <https://go.dev/play/p/1GswJFkjy1i>. And notably, that works with any iterator type returned from Range. So it's still not an argument for why you have to pass around the container itself.
Even though I brought up the concern that the mechanism from the proposal doesn't allow passing around containers, I'm coming increasingly around to the fact that it's just not a big deal.
—
Reply to this email directly, view it on GitHub <#54245 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABF2U4PR7R3X6AIY6TGLFNLWANUMZANCNFSM55QVDQJA>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
-
IMO Assuming adding tuples is out of the question, I would prefer a special
|
Beta Was this translation helpful? Give feedback.
-
There's a bit of intuition from session types I thought might be interesting regarding fallible iterators, where for each unit of progress, the currently active participant shares their perspective on the state of the session. Not on the kind of type theoretic basis that yields formal proofs, and definitely not to suggest there's a need for semantics that yield formal proofs, but just to borrow that bit of intuition - to sketch out another idea for fallible iterators / revisit some other ideas in a particular way:
The
(Because
-- This looked nicer to me in the details than some alternatives /shrug. Particularly, it allows setting and observing errors that are communicated from consumer to generator, or from generator to consumer -this is possible but messy otherwise. Changing the signature of Sentinel errors have been mentioned earlier in a many places. To draw a distinction, the idea here would be to have not exactly sentinel logic, but session logic only for fallible iterators. I think this could still coexist with infallible iterators that work like
*A fallible-aware loop just handles errors internally - it may set a State to something that isn't Ok, but I think the compiler can arrange for that to be equivalent to what The '?' mark scenario: One behavior could be that an infallible loop just ignores 'state', and behind the scenes any generator error stops iteration, and the error is silent/lost. Another behavior could be that it doesn't type check (because |
Beta Was this translation helpful? Give feedback.
-
We split the discussion of language changes out to #56413. |
Beta Was this translation helpful? Give feedback.
-
Thanks very much for all the discussion here. It has been very helpful in pointing out a number of difficulties with this approach. We're going to rethink the language changes over at #56413, which notably also rethinks the approach to an iterator |
Beta Was this translation helpful? Give feedback.
-
This is a discussion that is intended to lead to a proposal.
This was written with lots of input from @jba and @rsc.
Background
Most languages provide a standardized way to iterate over values stored in containers using an iterator interface (see the appendix below for a discussion of other languages). Go provides
for
range
for use with maps, slices, strings, arrays, and channels, but it does not provide any general mechanism for user-written containers, and it does not provide an iterator interface.Go does have examples of non-generic iterators:
runtime.CallersFrames
returns aruntime.Frames
that iterates over stack frames;Frames
has aNext
method that returns aFrame
and a bool that reports whether there are more frames.bufio.Scanner
is an iterator through anio.Reader
, where theScan
method advances to the next value. The value is returned by aBytes
method. Errors are collected and returned by anErr
method.database/sql.Rows
iterates through the results of a query, where theNext
method advances to the next value and the value is returned by aScan
method. TheScan
method can return an error.Even this short list reveals that there are no common patterns in use today. This is in part because before generics were introduced, there was no way to write an interface that described an iterator. And of course there may be no simple pattern that will cover all of these use cases..
Today we can write an interface
Iter[E]
for an iterator over a container that has elements of typeE
. The existence of iterators in other languages shows that this is a powerful facility. This proposal is about how to write such an interface in Go.What we want from Go iterators
Go is of course explicit about errors. Iterators over containers can't fail. For the most common uses it doesn't make any more sense to have iterators return an error than it does for a
for
range
statement to return an error. Algorithms that use iterators should often behave differently when using iterators that can fail. Therefore, rather than try to combine non-failing and failing iterators into the same interface, we should instead return explicit errors from iterators that can fail. These errors can be part of the values returned by the iterator, or perhaps they can be returned as additional values.Iterators have two fundamental operations: retrieve the current value, and advance to the next value. For Go we can combine these operations into a single method, as
runtime.Frames
does.In the general case we may want to implement iterators with some additional state that is not trivially garbage collected, such as an open file or a separate goroutine. In C++, for example, this state would be cleared up by a destructor, but of course Go does not have destructors. Therefore, we should have some explicit way to indicate that we no longer need an iterator. This should be optional, as many iterators do not require any special cleanup. We should encourage iterators to use finalizers if necessary to clean up resources, and also to clean up after themselves when reaching the end of an iteration.
In Go the builtin type map permits values to be inserted and removed while iterating over the map, with well-defined behavior. In general for Go we should be flexible, though of course the program should never simply crash. We should let each container type define how it behaves if the container is modified while iterators are active. For example, container modification may cause arbitrary elements to be skipped or returned two or more times during the iteration. In some cases, hopefully rare, container modification may cause uses of existing iterators to panic, or to return values that have been removed from the container.
Proposal
We define a new package
iter
that defines a set of interfaces. The expectation is that containers and other types will provide functions and methods that return values that implement these interfaces. Code that wants to work with arbitrary containers will use the interfaces defined in this package. That will permit people to write functions that work with containers but are agnostic to the actual container type being used, much as interfaces likeio.Reader
permit code to be agnostic as the source of the data stream.iter.Iter
The core interface in the iterators package is
iter.Iter[E]
.We also define a related interface for containers, such as maps, for which elements inherently have two values.
An iterator that can fail will either return a single value that includes an error indication, or it will implement
Iter2[E, error]
. It's not yet clear which of those options is better.As mentioned above, some iterators may have additional state that may be discarded when no more values are expected from an iterator (for example, a goroutine that sends values on a channel). Telling the iterator that no more values are expected is done using an optional interface that an iterator may implement.
The
Stop
method should always be considered to be an optimization. The program should work correctly even ifStop
is never called. If an iterator is read to the end (untilNext
returnsfalse
) callingStop
should be a no-op. If necessary, iterator implementations should use finalizers to clean up cases whereStop
is not called.As a matter of programming style, the code that calls a function to obtain a
StopIter
is responsible for calling theStop
method. A function that accepts anIter
should not use a type assertion to detect and call theStop
method. This is similar to the way that a function that accepts anio.Reader
should not use a type assertion to detect and call theClose
method.iter.New functions
iter.Iter
provides a convenient way for the users of a container to iterate over its contents. We also want to consider the other side of that operation, and provide convenient ways for containers to define iterators.An appendix below discusses how these functions can be implemented efficiently.
Simpler containers may be able to easily capture all required state in a function.
iterators for standard containers
The iter package will define iterators for the builtin container types.
Functions that accept iterators
The iter package could define functions that operate on iterators. We should be conservative here to start. It's not yet clear which of these functions will be useful.
Range loops
The
for
range
syntax will be expanded to support iterators. Note that this is the only language change in this proposal. Everything else is library code and programming conventions.If the argument to
range
implementsIter[E]
orIter2[E1, E2]
, then the loop will iterate through the elements of the iterator. For example, this code:will be equivalent to this code:
Here
_ok
is a hidden variable that is not seen by user code.Note that breaking out of the loop will leave the iterator at that position, such that
Next
will return the next elements that the loop would have seen.Using
range
with anIter2[E1, E2]
will permit using two variables in thefor
statement, as withrange
over a map.Compatibility note: if the type of
it
is a slice, array, pointer-to-array, string, map, or channel type, then theNext
method will be ignored and thefor
range
will operate in the usual way. This is required for backward compatibility with existing code.Because it's inconvenient to write
for v := range c.Range()
, we propose a further extension: we permitrange c
ifc
has a methodRange
that returns a value that implementsIter[E]
orIter2[E]
. If theRange
method implementsStopIter[E]
orStopIter2[E]
then the range loop will ensure thatStop
is called when exiting the loop. (Here whether the result implementsStopIter
is a static type check, not a dynamic type assertion: if theRange
method returns the typeIter[E]
,Stop
will not be called even if the actual type has aStop
method.)For example:
where
c.Range
returns a value that implementsStopIter[E]
, is roughly equivalent to:The compiler will arrange for
_it.Stop
to be called if the loop statements panic, even if some outerdefer
in the function recovers the panic. That is, thedefer
in the roughly equivalent code is run when leaving the loop, not just when leaving the function.Note that if we adopt this change it will be the first case in which a language construct invokes a user-defined method.
That is all
That completes the proposal.
Optional future extensions
We can use optional interfaces to extend the capabilities of iterators.
For example, some iterators permit deleting an element from a container.
We could then implement
Similarly some iterators permit setting a value.
We could then implement
Bi-directional iterators can implement a
Prev
method.This is just a sketch of possible future directions. These ideas are not part of the current proposal. However, we want to deliberately leave open the possibility of defining additional optional interfaces for iterators.
Examples
This is some example code showing how to use and create iterators. If a function in this section is not mentioned above, then it is purely an example, and is not part of the proposal.
Appendix: Iterators in other languages
C++
The C++ Standard Template Library defines a variety of iterator APIs. These are consistently implemented by C++ containers and are also used by other types such as files and streams. This makes it possible to write standard algorithms that work with all C++ containers.
C++ containers provide
begin
andend
methods that return iterators. Thebegin
method returns an iterator that refers to the beginning of the container. Theend
method returns an iterator that refers to the position just past the end of the container. Iterators to the same container may be compared for equality using the==
and!=
operators. Any valid iterator (not the iterator returned byend
) refers to a value in the container. That value is accessible using the unary*
operator (which is the pointer dereference operator, thus iterators act like pointers into the container, and ordinary pointers act like iterators). The unary++
operator advances the iterator to refer to the next element in the container. For any C++ container one can loop over all elements in the container by writingAs of C++11 this pattern is built into the language via the range-based
for
loop.This calls the
begin
andend
methods of the container and loops as shown above.Some C++ iterators have optional additional capabilities. Iterators can be grouped into five types.
*p = v
), but do not permit retrieving it. Example: an iterator that writes values to a file.--
operator to move to the preceding element. Example: an iterator over a doubly linked list.<
and friends, and indexing off an iterator to refer to a value. Example: an iterator over a slice (which C++ calls a vector).C++ algorithms can use function overloading to implement the same algorithm in different ways depending on the characteristics of the iterator. For example,
std::reverse
, which reverses the elements in a container, can be implemented with a bidirectional iterator, but uses a more efficient algorithm when called with a random access iterator.C++ iterators do not provide any form of error handling. Iterators over containers typically can't fail. An iterator associated with a file handles I/O errors by setting the file object into an error state, which can optionally cause an exception to be thrown.
Each C++ container type defines rules for when a modification to the container invalidates existing iterators. For example, inserting an element in a linked list does not invalidate iterators pointing to other elements, but inserting an element in a vector may invalidate them. Using an invalid iterator is undefined behavior, which can cause the program to crash or arbitrarily misbehave.
Java
Java also supports iterators to step through containers. Java iterators are much simpler than C++ iterators. Java defines an interface
Iterator<E>
that has three main methods:hasNext
,next
, andremove
. Callingnext
in a situation wherehasNext
would returnfalse
will throw an exception, and in general Java iterators throw an exception for any error. (By the way, in C++ removing an iterator from a container is generally implemented as anerase
method on the container type that takes an iterator as an argument.)A Java container will have an
iterator
method that returns anIterator
that walks over the elements of the container. This too is described as a Java interface:Iterable<E>
.Java has a iterator using loop syntax like that of C++11 (C++ copied the syntax from Java):
This calls the
iterator
method on the container and then callshasNext
andnext
in a loop.As far as I know Java does not have a standard implementation of output iterators or random access iterators. Specific containers will implement iterators with an additional
set
method that permits changing the value to which the iterator refers.If a Java iterator is used after a container is modified in some way that the iterator can't support, the iterator methods will throw an exception.
Python
A container will implement an
__iter__
method that returns an iterator object. An iterator will implement a__next__
method that returns the next element in the container, and raises aStopIteration
exception when done. Code will normally call these methods via the builtiniter
andnext
functions.The Python
for
loop supports iterators.This calls
iter
andnext
, and handles theStopIteration
exception, as one would expect.Python iterators generally don't permit modifying the container while an iterator is being used, but it's not clear to me precisely how they behave when it happens.
Discussion
For C++ and Python, iterators are a matter of convention: any type that implements the appropriate methods can return an iterator, and an iterator itself must simply implement the appropriate methods and (for C++) operator overloads. For Java, this is less true, as iterators explicitly implement the
Iterator<E>
interface. Thefor
loop in each language just calls the appropriate methods.These conventions are powerful because they permit separating the details of an algorithm from the details of a container. As long as the container implements the iterator interface, an algorithm written in terms of iterators will work.
Iterators do not handle errors in any of these languages. This is in part because errors can be handled by throwing exceptions. But it is also because iterating over a container doesn't fail. Iteration failure is only possible when a non-container, such as a file, is accessed via the iterator interface.
It's worth noting that the C++ use of paired begin and end iterators permit a kind of sub-slicing, at least for containers that support bidirectional or random access iterators.
Appendix: Efficient implementation of
iter.NewGen
The natural way to implement
iter.NewGen
is to use a separate goroutine and a channel. However, we know from experience that that will be inefficient due to scheduling delays. A more efficient way to implementNewGen
will be to use coroutines: let the generator function produce a new value and then do a coroutine switch to the code using the iterator. When that code is ready for the next value, do a coroutine switch back to the generator function. A coroutine switch can be fast: simply change the stack pointer and reload the registers. No need to go through the scheduler.Of course Go doesn't have coroutines, but we can use compiler optimizations to achieve the same effect without any language changes. This approach, and much of the text below, is entirely due to @rsc.
First, we identify programming idioms that provide concurrency without any opportunity for parallelism, such as a send immediately followed by a receive. Second, we adjust the compiler and runtime to recognize the non-parallel idioms and optimize them to simple coroutine switches instead of using the thread-aware goroutine scheduler.
Coroutine idioms
A coroutine switch must start another goroutine and then immediately stop the current goroutine, so that there is no opportunity for parallelism.
There are three common ways to start another goroutine: a go statement creating a new goroutine, a send on a channel where a goroutine is blocked, and a close on a channel where a goroutine is blocked.
There are three common ways to immediately stop the current goroutine: a receive of one or two values (with or without comma-ok) from a channel with no available data and a return from the top of a goroutine stack, exiting the goroutine.
Optimizations
The three common goroutine starts and three common goroutine stops combine for nine possible start-stop pairs. The compiler can recognize each pair and translate each to a call to a fused runtime operation that does both together. For example a send compiles to
chansend1(c, &v)
and a receive compiles tochanrecv1(c, &v)
. A send followed by a receive can compile tochansend1recv1(c1, &v1, c2, &v2)
.The compiler fusing the operations creates the opportunity for the runtime to implement them as coroutine switches. Without the fusing, the runtime cannot tell whether the current goroutine is going to keep running on its own (in which case parallelism is warranted) or is going to stop very soon (in which case parallelism is not warranted). Fusing the operations lets the runtime correctly predict the next thing the goroutine will do.
The runtime implements each fused operation by first checking to see if the operation pair would start a new goroutine and stop the current one. If not, it falls back to running the two different operations sequentially, providing exactly the same semantics as the unfused operations. But if the operation pair does start a new goroutine and stop the current one, then the runtime can implement that as a direct switch to the new goroutine, bypassing the scheduler and any possible confusion about waking new threads (Ms) or trying to run the two goroutines in different threads for a split second.
Note that recognizing these coroutine idioms would have potential uses beyond iterators.
NewGen
Here is an implementation of
iter.NewGen
that takes advantage of this technique.The compiler would need to fuse the commented operation pairs for potential optimization by the runtime: send followed by receive (1, 2), close followed by return (3), go followed by receive (4), send followed by comma-ok receive (5), and close followed by receive (6).
Beta Was this translation helpful? Give feedback.
All reactions