Make goals use C++20 coroutines #11005

L-as · 2024-07-01T15:57:09Z

Part of building refactor WG work. I don't necessarily expect to merge the code in the current state, but it's not really a draft since I think it works. There might be bugs hiding somewhere though. OTOH bugs we haven't discovered yet might have been removed by this PR.

Motivation

Goals before this PR do coroutine-like things explicitly
using state and setting function pointers.
Everytime work() is called, the state function pointer is called.
The control flow resulting from this design is not very obvious (for me personally at least).
One consequence of this design is that any state that has to be maintained between
different states has to be in the goal class, such that all variables that could
ever be used are all fields of the class.

It is because of this not clear what fields are used when,
and more importantly, which are initialized in which states.

It is further complicated by the fact that if you do not set the state,
it implicitly loops.

It is not a "nice" way of writing the goal logic.

C++20 however has coroutines, which can do much of what we do manually, automatically.

This PR is an experiment in that direction, seeing how goal logic would look
using coroutines.

Context

Reference for learning about coroutines:

My own as short as possible explanation:
Coroutines are functions that use co_await/co_return.
When you run the coroutine function, a coroutine handle is created,
alongside its user-defined promise type.
There are suspension points at the beginning, at every co_await,
and at the final (possibly implicit) co_return.
Once suspended, you can resume the coroutine by doing coroutine_handle.resume().
Suspension points are implemented by passing a struct to the compiler
that implements await_suspend.
await_suspend can either say "cancel suspension", in which case execution resumes,
"suspend", in which case control is passed back to the caller of coroutine_handle.resume()
or the place where the coroutine function is initially executed in the case of the initial
suspension, or await_suspend can specify another coroutine to jump to, which is
how tail calls are implemented.

I've tried to roughly make every state into a coroutine.
Where before you'd do state = &f; return, you now do co_await SuspendGoal{}; co_return f().
However, you can also do co_await f(), i.e. a "coroutine" call!

Places where a suspension would happen implicitly (i.e. return-ing from the function),
are now explicit by doing co_await SuspendGoal{}.

Each goal at all times keeps track of the top_co, i.e. the coroutine last (possibly now) executed.
When we co_return another coroutine, i.e. do a "coroutine" tail call, we set the top_co,
and switch to it. Our current coroutine is destroyed.

When we co_await another coroutine, we swap top_co, and set the previous top_co,
i.e. the coroutine being currently executed, as the continuation of the coroutine to be called.

Once a coroutine has finished executing (at co_return), we check whether the goal is still busy
(is status ecBusy?), and if so, we jump to the continuation at the final suspension point.
If not, we suspend the coroutine and ignore the continuation, since the goal has finished.
The goal should be destroyed by someone else after this point.

Coroutines are wrapped in Co to ensure automatic destruction.

Currently, there is the limitation that our coroutines (Co) can not return a value,
but this should be trivial to implement using templates, or you could just pass in a pointer
to the coroutine and "return" the value that way.
It has however not been necessary yet.

I've rewritten most of the code mechanically.
DrvOutputSubstitionGoal I've used the most effort on, and it's an example
of how the logic can be heavily simplified.
The intermediate variables no longer have to be fields of the class and aren't in the
header anymore.

Alternatives

You could have heavily simplified the code without coroutines too.
You could have made state into a function pointer with state,
i.e. a class, to improve the legibility of the control flow,
and also prevent implicit looping by making the state return the
next state rather than setting a field that already has a value.

You could also have reimplemented the code in another language entirely.

I deemed both of these suboptimal. We're already using C++, so why not use
all of the features. Coroutines are mostly well supported by all major compilers
AFAICT, albeit GCC didn't support a feature that would reduce complexity by a small
amount (getting the this of coroutine methods in promise_type constructor).

There are obviously better languages than C++, but deciding on a new language
is a social problem as well as a technical one. There also isn't a language
that is better than C++(20) on all fronts (e.g. bootstrappability, multiple implementations,
has a specification).

I feel that we can heavily improve the quality of the Nix codebase without switching from C++.

We are probably better off having multiple implementations of Nix anyway in my own humble opinion.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

L-as · 2024-07-01T15:58:05Z

@Ericson2314 @rhelmot

L-as · 2024-07-01T15:59:13Z

I probably need to add more comments. I probably also need to fix the formatting in some places, might have used 2-space indentation by accident in some places.

src/libstore/build/goal.hh

src/libstore/build/goal.cc

src/libstore/build/derivation-goal.cc

src/libstore/build/goal.hh

src/libstore/build/substitution-goal.cc

src/libstore/build/goal.hh

src/libstore/build/substitution-goal.cc

L-as · 2024-07-15T17:20:04Z

~~Depends on #11073~~

I dropped the goal of using the implicit HALO optimization, i.e. allocating the coroutine's frame within the caller's frame. This isn't really feasible with way C++ coroutines are designed while also supporting tail-calls. You could do some trampoline-thing to make it work, but for the future we can also make the allocation explicitly in the caller frame by overloading operator new. I also did some minor refactoring, and the code is simpler now.

GCC seemingly doesn't support promise type constructor, so we avoid using it. We were using it for getting access to the Goal, i.e. the `this` of the coroutine, but we avoid this by passing it once explicitly when initializing the root coroutine. Then, on co_await/co_return, we get the goal from the calling coroutine. We also add some useful assertion in a bunch of places.

@Ericson2314

From feedback from @Ericson2314. Comments from goal.cc have been moved to goal.hh and made doc comments. Unnecessary lines of code that don't affect functionality have also been removed. Commented out assertion was uncommented-out.

Uses the same style as DrvOutputSubstitutionGoal roughly.

…e instead of Co Logic for final_awaiter also changed a slight bit to be more correct.

L-as · 2024-07-15T18:08:27Z

I tested this with memory sanitizer and address sanitizer and don't think I saw any new errors.

It seems to work? I rebased on master and removed the commits from #11073.

Ericson2314

Let's go! Very excited to usher in the era of this build/* code being actually readable :)

This broke in NixOS#11005. Any number of PathSubstitutionGoals would be woken up by a single build slot becoming available. If there are a lot of substitution goals active, this could lead to us running out of file descriptors (especially on macOS where the default limit is 256).

This broke in #11005. Any number of PathSubstitutionGoals would be woken up by a single build slot becoming available. If there are a lot of substitution goals active, this could lead to us running out of file descriptors (especially on macOS where the default limit is 256). (cherry picked from commit a33cb8a)

L-as requested a review from Ericson2314 as a code owner July 1, 2024 15:57