-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an ARCHITECTURE.md document #302
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,357 @@ | ||||||
# smol-rs Architecture | ||||||
|
||||||
The architecture of [`smol-rs`]. | ||||||
|
||||||
This document describes the architecture of [`smol-rs`] and its crates on a high | ||||||
level. It is intended for new contributors who want to quickly familiarize | ||||||
themselves with [`smol`]'s composition before contributing. However it may also | ||||||
be useful for evaluating [`smol`] in comparison with other runtimes. | ||||||
|
||||||
## Thousand-Mile View | ||||||
|
||||||
[`smol`] is a small, safe and fast concurrent runtime built in pure Rust. Its | ||||||
primary goal is to enable M:N concurrency in Rust programs; multiple coroutines | ||||||
can be multiplexed onto a single thread, allowing a server to handle hundreds | ||||||
of thousands (if not millions) of clients at a time. However, [`smol`] can just | ||||||
as easily multiplex tasks onto multiple threads to enable blazingy fast | ||||||
concurrency. [`smol`] is intended to work on any scale; [`smol`] should work for | ||||||
programs with two coroutines as well as two million. | ||||||
|
||||||
On an architectural level, [`smol`] prioritizes maintainable code and clarity. | ||||||
[`smol`] aims to provide the performance of a modern `async` runtime while still | ||||||
remaining hackable. This philosophy informs much of the decisions in [`smol`]'s | ||||||
codebase and differentiate it from other contemporary runtimes. | ||||||
|
||||||
On a technical level, [`smol`] is a [work-stealing] executor built around a | ||||||
[one-shot] asynchronous event loop. It also contains a thread pool for | ||||||
filesystem operations and a reactor for waiting for child processes to finish. | ||||||
[`smol`] itself is a meta-crate that combines the features of numerous subcrates | ||||||
into a single `async` runtime-based package. | ||||||
|
||||||
smol-rs consists of the following crates: | ||||||
|
||||||
- [`async-io`] provides a one-shot reactor for polling asynchronous I/O. It is | ||||||
used for registering sockets into `epoll` or another system, then polling them | ||||||
simultaneously. | ||||||
- [`blocking`] provides a managed thread-pool for polling blocking operations as | ||||||
asynchronous tasks. It is used by many parts of [`smol`] for turning | ||||||
operations that would normally be non-concurrent into concurrent operations, | ||||||
and vice versa. | ||||||
- [`async-executor`] provides a work-stealing executor that is used as the | ||||||
scheduler for an `async` program. While the executor isn't as optimized as | ||||||
other contemporary executors, it provides a performant executor implemented in | ||||||
mostly safe code in under 1.5 KLOC. | ||||||
- [`futures-lite`] provides a handful of low-level primitives for combining and | ||||||
dealing with `async` coroutines. | ||||||
- [`async-channel`] and [`async-lock`] provide synchronization primitives that | ||||||
work to connect asynchronous tasks. | ||||||
- [`async-net`] provides a set of higher-level APIs over networking primitives. | ||||||
It combines [`async-io`] and [`blocking`] to create a full-featured, fully | ||||||
asynchronous networking API. | ||||||
- [`async-fs`] provides an asynchronous API for manipulating the filesystem. The | ||||||
API itself is built on top of [`blocking`]. | ||||||
|
||||||
These subcrates in and of themselves depend on subcrates for further | ||||||
functionality. These are explained in a more bottom-up fashion below. | ||||||
|
||||||
## Lower Level Crates | ||||||
|
||||||
These crates provide safer, lower-level functionality used in the higher-level | ||||||
crates. These could be used in higher-level crates but are intended primarily | ||||||
for use in [`smol`]'s underlying plumbing. | ||||||
|
||||||
### [`parking`] | ||||||
|
||||||
[`parking`] is used to block threads until an arbitrary signal is received, or | ||||||
"parks" them. The [`std::thread::park`] API suffers from involving global state; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ", or parks them" is worded a bit confusingly (as "parks" is/should be associated to "block threads"), but this isn't obvious here. |
||||||
any arbitrary user code can unpark the thread and wake up the parker. The goal | ||||||
of this crate is to provide an API that can be used to block the current thread | ||||||
and only wake up once a signal is delivered. | ||||||
|
||||||
[`parking`] is implemented relatively simply. It uses a combination of a | ||||||
[`Mutex`] and a [`Condvar`] to store whether the thread is parked or not, then | ||||||
wait until the thread has received a wakeup signal. | ||||||
|
||||||
### [`waker-fn`] | ||||||
|
||||||
[`waker-fn`] is provided to easily create [`Waker`]s for use in `async` | ||||||
computation. The [`Waker`] is similar to a callback in the `async` models of | ||||||
other languages; it is called when the coroutine has stopped waiting and is | ||||||
ready to be polled again. [`waker-fn`] makes this comparison more literal by | ||||||
literally allowing a [`Waker`] to be constructed from a callback. | ||||||
|
||||||
[`waker-fn`] is used to construct higher-level asynchronous runtimes. It is | ||||||
implemented simply by creating an object that implements the [`Wake`] trait and | ||||||
using that as a [`Waker`]. | ||||||
|
||||||
### [`atomic-waker`] | ||||||
|
||||||
In order to construct runtimes, you need to be able to store [`Waker`]s in a | ||||||
concurrent way. That way, different parties can simultaneously store [`Waker`]s | ||||||
to show interest in an event, while activators of the event can take that | ||||||
[`Waker`] can wake it. [`atomic-waker`] provides [`AtomicWaker`], which is a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
low-level solution to this problem. | ||||||
|
||||||
[`AtomicWaker`] uses an interiorly mutable slot protected by an atomic variable | ||||||
to synchronize the [`Waker`]. Storing the [`Waker`] acquires an atomic lock, | ||||||
writes to the slot and then releases that atomic lock. Reading the [`Waker`] out | ||||||
requires acquiring that lock and moving the [`Waker`] out. | ||||||
|
||||||
[`AtomicWaker`] aims to be lock-free while also avoiding spinloops. It uses a | ||||||
novel strategy to accomplish this task. Simultaneous attempts to store | ||||||
[`Waker`]s in the slot will choose one [`Waker`] while waking up the other | ||||||
[`Waker`], making the task inserting that [`Waker`] try again. Simultaneous | ||||||
attempts to move out the [`Waker`] will have one of the operations return the | ||||||
[`Waker`] and the others return `None`. Simultaneous attempts to write to and | ||||||
read from the slot will mark the slot as "needs to wake", making the writer wake | ||||||
up the [`Waker`] immediately once it has acquired the lock. | ||||||
|
||||||
The main weakness of [`AtomicWaker`] is the fact that it can only store one | ||||||
[`Waker`], meaning only one task can wait using this primitive at once. This | ||||||
limitation is addressed in other code in [`smol-rs`]. | ||||||
|
||||||
### [`fastrand`] | ||||||
|
||||||
Higher-level operations require a fast random number generator in order to | ||||||
provide fairness. [`fastrand`] is a dead-simple random number generator that | ||||||
aims to provide psuedorandomness. | ||||||
|
||||||
The underlying RNG algorithm for [`fastrand`] is [wyrand], which provides | ||||||
decently distributed but fast random numbers. Most of the crate is built on a | ||||||
function that generates 64-bit random numbers. The remaining API transforms this | ||||||
function into more useful results. | ||||||
|
||||||
Global RNG is provided by a thread-local slot that is queried every time global | ||||||
randomness is needed. All generated RNG instances derive their seed from the | ||||||
thread-local RNG. The seed for the global RNG is derived from the hash of the | ||||||
thread ID and local time or, on Web targets, the browser RNG. | ||||||
|
||||||
The API of [`fastrand`] is deliberately kept small and constrained. Higher-level | ||||||
functionality is moved to [`fastrand-contrib`]. | ||||||
|
||||||
### [`concurrent-queue`] | ||||||
|
||||||
[`concurrent-queue`] is a fork of the [`crossbeam-queue`] crate. It provides a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be a good idea to explain why the fork happened. |
||||||
lock-free queue containing arbitrary items. There are three types of queues: | ||||||
optimized single-item queues, queues with a bounded capacity, and infinite | ||||||
queues with unbounded capacity. | ||||||
Comment on lines
+136
to
+137
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels redundant. |
||||||
|
||||||
Each queue works in the following way: | ||||||
|
||||||
- The single-capacity queue is essentially a spinlock around an interiorly | ||||||
mutable slot. Reads to or writes from this slot lock the spinlock. | ||||||
- The bounded queue contains several slots that each track their state, as well | ||||||
as pointers to the current head and tail of the list. Pushing to the queue | ||||||
moves the tail forward and writes the data to the slot that was just moved | ||||||
past. Popping the queue pushes the head forward and moves out the slot that | ||||||
was previously pushed into. Two "laps" of the queue are used in order to | ||||||
create a "ring buffer" that can be continuously pushed into or popped from. | ||||||
- The unbounded queue works like a lot of bounded queues linked together. It | ||||||
starts with a bounded queue. When the bounded queue runs out of space, it | ||||||
creates another queue, links it to the previous queue and pushes items to | ||||||
there. All of the created bounded queues are linked together to form a | ||||||
seamless unbounded queue. | ||||||
|
||||||
### [`piper`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`polling`] | ||||||
|
||||||
[`polling`] is a portable interface to the underlying operating system API for | ||||||
asynchronous I/O. It aims to allow for efficiently polling hundreds of thousands | ||||||
of sockets at once using underlying OS primitives. | ||||||
|
||||||
[`polling`] is a relatively simple wrapper around [`epoll`] on Linux and | ||||||
[`kqueue`] on BSD/macOS. Most of the code in [`polling`] is dedicated to | ||||||
providing wrappers around [IOCP] on Windows and [`poll`] on Unixes without any | ||||||
better option. | ||||||
|
||||||
[IOCP] is built around waiting for specific operations to complete, rather than | ||||||
creating an "event loop". However, Windows exposes a subsystem called [`AFD`] | ||||||
that can be used to register a wait for polling a set of sockets. Once we've | ||||||
used internal Windows APIs to access [`AFD`], we group sockets into "poll | ||||||
groups" and then set up an outgoing "poll" operation for each poll group. From | ||||||
here we can collect these "poll" events from [IOCP] in order to simulate a | ||||||
classic Unix event loop. | ||||||
|
||||||
For the [`poll`] system call, the usual "add/modify/remove" system exposed by | ||||||
other event loop systems doesn't work, as [`poll`] only takes a flat list of | ||||||
file descriptors. Therefore we set up our own hash map of file descriptors, | ||||||
which is kept in sync with a list of `pollfd`'s that are then passed to | ||||||
[`poll`]. This list can be modified on the fly by waking up the [`poll`] syscall | ||||||
using a designated "wakeup" pipe, modifying the list, then resuming the [`poll`] | ||||||
operation. | ||||||
|
||||||
[`polling`] does not consider I/O safety and therefore its API is unsafe. It is | ||||||
the burder of higher-level APIs to implement safer APIs on top of [`polling`]. | ||||||
|
||||||
## Medium-Level Crates | ||||||
|
||||||
These crates provide a high-level, sometimes `async` API intended for use in | ||||||
production programs. These are featureful, safe and can even be used on their | ||||||
own. However, there are higher-level APIs that may be of interest to more | ||||||
casual users. | ||||||
|
||||||
### [`async-task`] | ||||||
|
||||||
In order to build an executor, there are essentially two parts to be | ||||||
implemented: | ||||||
|
||||||
- A task system that allocates the coroutine on the heap, attaches some | ||||||
concurrent state to it, then provides handles to that task. | ||||||
- A task queue that takes these tasks and decides how they are scheduled. | ||||||
|
||||||
[`async-task`] implements the former system in a way that it can be generically | ||||||
applied to different styles of executors. Essentially, [`async-task`] takes care | ||||||
of the boring, soundness-prone part of the executor so that higher-level crates | ||||||
can focus on optimizing their scheduling strategies. | ||||||
|
||||||
An asynchronous task is a function of two primitives. The first is the future | ||||||
provided by the user that is intended to be polled on the executor. The second | ||||||
is a scheduler function provided by the executor that is used to indicate that | ||||||
a future is ready and should be scheduled as soon as possible. | ||||||
|
||||||
At a low level, [`async-task`] can be seen as putting a future on the heap and | ||||||
providing a handle that can be used by executors. The [`Runnable`] is the part | ||||||
of the task that represents the future to be run. When the [`Runnable`] is | ||||||
spawned to the existence by the coroutine indicating that it is ready, it can | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is difficult to read. |
||||||
be either run to poll the future once and potentially return a value to the | ||||||
user, or dropped to cancel the future and drop the task. The [`Task`] is the | ||||||
user-facing handle. It returns `Pending` when polled until the [`Runnable`] is | ||||||
ran and the underlying future returns a value. | ||||||
|
||||||
The task handle can be modeled like this: | ||||||
|
||||||
```rust | ||||||
enum State<T> { | ||||||
Running(Pin<Box<dyn Future<Item = T>>>), | ||||||
Finished(T) | ||||||
} | ||||||
|
||||||
struct Inner<T> { | ||||||
task: Mutex<State<T>>, | ||||||
waker: AtomicWaker | ||||||
} | ||||||
|
||||||
pub struct Task<T> { | ||||||
inner: Weak<Inner<T>> | ||||||
} | ||||||
|
||||||
pub struct Runnable<T> { | ||||||
inner: Arc<Inner<T>>, | ||||||
} | ||||||
``` | ||||||
|
||||||
Running the [`Runnable`] polls the future once and, if it succeeds, stores the | ||||||
result of the future inside of the task. Polling the [`Task`] sees if the inner | ||||||
future has finished yet. If it fails, it stores its [`Waker`] inside of the | ||||||
state and waits until the [`Runnable`] finishes. | ||||||
|
||||||
In practice the actual implementation is much more optimized, is lock-free, only | ||||||
involves a single heap allocation and supports many more features. | ||||||
|
||||||
### [`event-listener`] | ||||||
|
||||||
[`event-listener`] is [`atomic-waker`] on steroids. It supports an unbounded | ||||||
number of tasks waiting on an event, as well as an unbounded number of users | ||||||
activating that event. | ||||||
|
||||||
The core of [`event-listener`] is based around a linked list containing the | ||||||
wakers of the tasks that are waiting on it. When the event is activated, a | ||||||
number of wakers are popped off of this linked list and woken. | ||||||
|
||||||
TODO: More in-depth explanation | ||||||
|
||||||
### [`async-signal`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-io`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`blocking`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
## Higher-Level Crates | ||||||
|
||||||
These are high-level crates, built with the intention of being used in both | ||||||
libraries and user programs. | ||||||
|
||||||
### [`futures-lite`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-channel`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-lock`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-executor`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-net`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-fs`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
### [`async-process`] | ||||||
|
||||||
TODO: Explain this! | ||||||
|
||||||
[`smol-rs`]: https://github.com/smol-rs | ||||||
[`smol`]: https://github.com/smol-rs/smol | ||||||
[`async-channel`]: https://github.com/smol-rs/async-channel | ||||||
[`async-executor`]: https://github.com/smol-rs/async-executor | ||||||
[`async-fs`]: https://github.com/smol-rs/async-fs | ||||||
[`async-io`]: https://github.com/smol-rs/async-io | ||||||
[`async-lock`]: https://github.com/smol-rs/async-lock | ||||||
[`async-net`]: https://github.com/smol-rs/async-net | ||||||
[`async-process`]: https://github.com/smol-rs/async-process | ||||||
[`async-signal`]: https://github.com/smol-rs/async-signal | ||||||
[`async-task`]: https://github.com/smol-rs/async-task | ||||||
[`atomic-waker`]: https://github.com/smol-rs/atomic-waker | ||||||
[`blocking`]: https://github.com/smol-rs/blocking | ||||||
[`concurrent-queue`]: https://github.com/smol-rs/concurrent-queue | ||||||
[`event-listener`]: https://github.com/smol-rs/event-listener | ||||||
[`fastrand`]: https://github.com/smol-rs/fastrand | ||||||
[`futures-lite`]: https://github.com/smol-rs/futures-lite | ||||||
[`parking`]: https://github.com/smol-rs/parking | ||||||
[`piper`]: https://github.com/smol-rs/piper | ||||||
[`polling`]: https://github.com/smol-rs/polling | ||||||
[`waker-fn`]: https://github.com/smol-rs/waker-fn | ||||||
|
||||||
[`std::thread::park`]: https://doc.rust-lang.org/std/thread/fn.park.html | ||||||
[`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html | ||||||
[`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html | ||||||
|
||||||
[`Wake`]: https://doc.rust-lang.org/std/task/trait.Wake.html | ||||||
[`Waker`]: https://doc.rust-lang.org/std/task/struct.Waker.html | ||||||
|
||||||
[`AtomicWaker`]: https://docs.rs/atomic-waker/latest/atomic_waker/ | ||||||
|
||||||
[`fastrand-contrib`]: https://github.com/smol-rs/fastrand-contrib | ||||||
[wyrand]: https://github.com/wangyi-fudan/wyhash | ||||||
|
||||||
[`crossbeam-queue`]: https://github.com/crossbeam-rs/crossbeam/tree/master/crossbeam-queue | ||||||
|
||||||
[`epoll`]: https://en.wikipedia.org/wiki/Epoll | ||||||
[`kqueue`]: https://en.wikipedia.org/wiki/Kqueue | ||||||
[IOCP]: https://learn.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports | ||||||
[`poll`]: https://en.wikipedia.org/wiki/Poll_(Unix) | ||||||
[`AFD`]: https://2023.notgull.net/device-afd/ | ||||||
|
||||||
[`Runnable`]: https://docs.rs/async-task/latest/async_task/struct.Runnable.html | ||||||
[`Task`]: https://docs.rs/async-task/latest/async_task/struct.Task.html | ||||||
|
||||||
[work-stealing]: https://en.wikipedia.org/wiki/Work_stealing | ||||||
[one-shot]: https://github.com/smol-rs/polling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we follow 100 chars limit for lines in code so I think we should follow the same in markdown files. Short lines make the document look longer and people can be put off by size of the reading. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I follow a limit for 80 chars in Markdown files, as some terminals rely on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These days? Which terminals? 😯
In any case, they'll have the same issues with the code. It's best to be consistent.