Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rpc): add wasmtime_rpc crate #8737

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rvolosatovs
Copy link
Member

@rvolosatovs rvolosatovs commented Jun 3, 2024

This is the first step in RPC-based Wasmtime plugin functionality support based on https://github.com/wrpc/wrpc

I will update this PR with more info going forward, but in short wasmtime-rpc, using wRPC, provides a way to extend the host runtime using interfaces defined in WIT without statically-generated bindings.
wRPC encodes values using Component Model Value Encoding (WebAssembly/component-model#336) on the wire - this encoding is implemented in this PR directly using wasmtime values and types. The implementation depends on https://docs.rs/wasm-tokio/latest/wasm_tokio/ developed by me in https://github.com/wrpc/wasm-tokio

wRPC is built on top of a core "transport" abstraction, which provides bidirectional, multiplexed byte streams.
Best supported wRPC transport is currently based on https://nats.io/ (implementation available at https://github.com/wrpc/wrpc/blob/86d06a487f3ab3ad71c476c0038dbb759282388a/crates/transport-nats/src/lib.rs), but IPC transport is currently underway and the protocol is completely transport-agnostic.
You can see a basic "static" outgoing transport implemented in the test.

The end goal here is that developers should be able to choose the "polyfill strategy" having a way to e.g. polyfill every component import missing from the linker, or a particular instance import etc. For this first step, I've only added functionality to polyfill a single function at a time given a linker instance, instance name, function name and its' type.

I've decided to omit the resource support from this PR, which is currently being redesigned in wRPC bytecodealliance/wrpc#101.

Note, wRPC contains a subtree-merged wit-bindgen adaptation with support for Go and Rust, but they are currently out-of-sync with latest CM value encoding spec (mostly around flag and char encoding).

wRPC fully supports async proposal as it currently stands, i.e. native future and stream types. An example WIT: https://github.com/wrpc/http/blob/5b9a324c74d72b27d8559c54b419a54609d01c68/examples/go/http-outgoing-client/wit/deps/wrpc-http/types.wit#L16-L38 and example usage of generated bindings in Go https://github.com/wrpc/http/blob/5b9a324c74d72b27d8559c54b419a54609d01c68/examples/go/http-outgoing-client/cmd/http-outgoing-client-nats/main.go#L38-L68

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs rvolosatovs changed the title feat(rpc): implement wasmtime_rpc::link_function feat(rpc): add wasmtime_rpc crate Jun 4, 2024
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your patience while I found time to get to review this. At a high level this all looks good modulo minor comments here or there but I wanted to drill in in some more of the details before we land this.

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

I'm writing down this idea with the purpose of explicitly saying that this implementation as-is is debt that we're accruing and don't have a plan to pay it down. Debt I mean in th sense of leaning on a known-explicitly-inefficient abstraction Val without a concrete plan on improving it. It's not necessarily required that this is paid down immediately but I do think it's important to consider this at the same time.

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Overall my current feeling is that I'm at least personally not understanding the long-term of this. I think it makes sense to add to wasmtime-the-CLI but it feels like an overly complex implementation for that use case. As-is it feels like this could suitably be an external crate to guide some API changes in Wasmtime itself, but I also realize you probably have more long-term goals for this being in-tree so I also think it would be good to get those written down to.

@rvolosatovs
Copy link
Member Author

rvolosatovs commented Jun 13, 2024

To make sure we're on the same page: this PR is a work-in-progress PoC at best, really just opened to align on the direction here, I would not expect it to be ready-for-review for another week at least. I've just recently reworked the transport abstraction, and a few changes, I think, are still coming. It is my absolute intention to add a simple to follow end-to-end example to this PR (most probably using QUIC transport I've just finished yesterday bytecodealliance/wrpc#127) before marking it as ready-for-review. I'm just now starting with updating the bindgen to a few changes made in value definition encoding and the new transport - that's the biggest blocker for now.

Note, that I will also add a more formal Wasmtime RFC for this

First, to clarify, the main intention of landing this in-repo is to serve as both an example for other users and to integrate this with the wasmtime CLI, right? In that case one part of examples in theory should be a "hello world" of how to set it up, for example given a component that imports something create/compile a client that serves it in addition to the CLI flags necessary to run the component at hand. One difficult part about this is going to be that the client source will live in a separate repository (and/or have many of its dependencies there), but if the end goal is to have CLI support for this I think we'll want to plan for an example too (ideally one run in CI).

The intention here is to, given an off-the-shelf wrpc_transport::Invoke implementation, likely contained within https://github.com/wrpc/wrpc (BA transfer pending), provide developers the tools to satisfy component imports of their choosing at runtime. Eventually I'd like to also add functionality to serve exports given a wrpc_transport::Serve implementation. The key benefit here is the standardized, generic traits (the Invoke and Serve) and the associated collection of fully-specified transports. I was a bit hesitant adding CLI support in this PR since I did not want to prematurely flesh out the UX of that, but since that feature would be feature gated and opt-in anyway (at least to start with), I guess I will just go ahead and add a simple CLI support for this in this PR as well. This PR will not be considered ready until there's both a "low-level", library use example and CLI usage example available (or documented).

Next I'm also sort of coming at this from the perspective of if APIs in wasmtime need to change or if APIs need to be updated. For example the usage of Val here feels unnecessarily inefficient. I've long thought the representation of Val is pretty inefficient (e.g. heap allocation for nested values and things like strings-for-flags right nwo). It's also pretty unfortunate that types need to be passed around manually here instead since especially with resources that gets tricky and requires shenanigans like substituted_component_type which is pretty non-obvious. Long-term what I think we'd ideally have is something along the lines of func_new_unchecked but without the unsafety. What I'm envisioning is that host functions could be defined as fn(StoreContextMut<'_, T>, args: ComponentArgs<'_>, ret: ComponentRet<'_>) -> Result<()>. The ComponentArgs structure would serve as a deserializer of sorts and the ComponentRet structure would act as a serializer of sorts. That way you could plug those directly into this protocol and avoid an intermediate copy through the host (e.g. the creation of a Val). That would also enable args.serialize() -> Vec<u8> and ret.deserialize(&[u8]) -> Result<()> where the component encoding format could be implemented directly in those two.

Absolutely agree, in fact I was always hoping that at one point we could make the runtime be "wRPC-aware" and directly convert to/from canon ABI values to "value definition" values. I believe, for some of these, the conversion is just an identity function, e.g. a bool encoding is exactly the same for both.
Regarding the traits, my (rough) suggestion for wRPC value trait analogues for this can be found here bytecodealliance/wrpc#101 (comment)

Similarly though I'm at least personally surprised by the complexity here. Namely link_function has an 11-line where clause along with two extra generics in the arguments themselves. Much of the complexity seems to be unused too, for example I couldn't actually figure out where deferred came into play, is it perhaps resources? It also seemed a little complex to have async work at all parts of the stack here, would it be reasonable to require that a single component value is required to be entirely in-memory during serialization or deserialization?

We try to be judicious about picking up new dependencies in Wasmtime so ideally this could be trimmed down to an AsyncWrite plus AsyncRead pair or, better yet, something like async fn(&Context, &[u8]) -> Result<Vec<u8>> where this crate would define just a single trait and the trait could be implemented by downstream consumers.

Like mentioned above, this is still a PoC and the interface will change slightly, that said, these are the two biggest complexity sources I've identified:

  • wasmtime::Error usage (i.e. just a anyhow::Error), which does not implement std::error::Error. I'd love to bind the associated error types to just implement std::error::Error, which would eliminate most of these awkward trait bounds, but unfortunately that would prevent anyone from using anyhow::Error as the implementation for these. anyhow::Error implements AsRef<dyn Error> https://docs.rs/anyhow/latest/anyhow/struct.Error.html#impl-AsRef%3Cdyn+Error%3E-for-Error, but it does not look like an "idiomatic" approach, e.g. it does not look like a blanket implementation would exist for std::error::Error, similarly, std::io::Error also does not implement such.
    • Another way to handle this could potentially be using associated type defaults, but that's not stable in Rust yet
  • async values (stream and future). As briefly mentioned in OP, wRPC has native support for async types. I fully acknowledge that Wasmtime today does not support async, however, I believe that's coming soon(ish?). Regardless, as we discussed offline, the implementation will be "smart" about host WASI resources and transfer outgoing-stream resource as stream<u8>. That's what the deferred field and the path indexes are for. For example, consider:
interface example {
   // reads bytes from rx and writes them to each stream in tx
   fn tee(rx: input-stream, tx: list<output-stream>) -> result<u64>
}

tee contract requires tx to be received before full contents of rx are available. The rx field would be encoded as option<list<u8>> in the first, "sync" pass - if full contents of rx are not available at encoding time, it would be sent as option::none, while a "deferred" writer of the underlying stream<u8> would be registered with index 0 (first parameter). tx would be fully encoded "synchronously". After the "sync" parameters are sent, the implementation would concurrently be sending more bytes of rx and await the result<u64>.

There is no resource support in PR currently, so deferred is not used indeed, I am happy to add that in this PR - was hoping to keep the scope small, but I don't mind adding more things here if that's desired.

So to answer your question, neither "just" AsyncRead/AsyncWrite, nor async fn(&Context, &[u8]) -> Result<Vec<u8>> could actually satisfy the requirements here without breaking the contracts in general case, for two reasons:

  1. values may not fit in memory and may even not even be "finite" in some sense (e.g. if the input-stream is an HTTP body stream or a TCP stream), in fact, the input async value could depend on the output async value, so blocking here is not acceptable
  2. values may be arbitrarily nested (e.g. a record { foo: stream<stream<stream<vec<stream<future<u8>>>>>> }). Wasmtime would need to implement it's own framing of some kind to differentiate different chunks of the async values, which I believe is not desired, because:

All that said, I'm happy to add a fully-synchronous Invoke trait to this crate with blanket impl for wrpc_transport::Invoke, which would also bind the error types to be wasmtime::Error and use that in bounds removing the async support from this PR for now. That would mean that this trait would require breaking changes later on, but perhaps it would be easier to align on the async design in a follow-up PR?

@alexcrichton
Copy link
Member

Sorry for the delay in getting back to this, but this all sounds reasonable enough to me. I think it'd be best to go an RFC route on this as it sounds like you're already intending to do where there can be a better understanding of the high-level goals and directions of this.

For example I would not want to take on all the complexity of planning for stream<T> and future<T> at this time. I'd rather defer that to later. I would, however, find it useful to see what the plan for resources will be. Furthermore I think the basic performance profile of this will guide the implementation (e.g. Val-or-not, all-in-memory or not, etc) pretty significantly so I think it would be best to settle these sorts of high-level details through that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants