Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust PROBE frontend #20

Merged
merged 37 commits into from
Jul 25, 2024
Merged

Rust PROBE frontend #20

merged 37 commits into from
Jul 25, 2024

Conversation

Ex-32
Copy link
Collaborator

@Ex-32 Ex-32 commented Jun 24, 2024

Goals

A performant and safe (some unsafe code is required, but I've tried to keep it to a minimum) libprobe frontend with minimal runtime dependencies that doesn't require loading path injection (such as the PYTHONPATH injection the current PROBE script does) in order to be as environment agnostic as possible.

Current coverage

Currently, only the record and dump commands from the python implementation have been implemented.

Benefits

Currently, this rust frontend has the following benefits over the python implementation:

  1. Improved Serialization Format
    Rather than retaining the raw C-structs emitted by libprobe as a serialization format, this frontend marshals the raw arena data into rust data structures and from there serializes them into a more cross-platform format: each probe_log is a gzip compressed tar archive containing a folder structure much like a libprobe arena directory (<PID>/<EXEC_EPOCH>/<TID>), but instead of being a directory, each <TID> is a jsonlines file, where each line is an Op serialized as json. In addition, there is a file named _metadata at the root of the archive containing a json object that includes the entry PID (that is, the PID of the process directly launched by the frontend), as well as copious OS and hardware details.

  2. Dependencies
    While this frontend does use several external dependencies, due to rust's build system and language architecture they are statically linked into the binary at build-time. According to ldd the only runtime dependency (excluding ld.so, compiler runtime(s), and vDSO), is libc.

  3. Multithreading
    While single commands or short scripts don't benefit much from this due to the overhead of creating threads, the data-marshaling and serialization is done in a stream-based manner using functional patterns that can be trivially parallelized, currently it parallelizes each PID automatically, although ideally I'd like to eventually replace this with some form of heuristic.

  4. Fast
    While I haven't done extensive testing, testing a moderately complex command (bash -c "nvim +q; lsd -la /etc") with hyperfine, the rust frontend performed 139% faster (2.39x the speed) than the python frontend. According to the rust frontend, running that command produced 2478 Ops, while the python frontend crashed when trying to dump its log.

Known Issues

  1. hacky build process
    Currently the prov_ops.h header file has to be coppied from libprobe/include to probe_frontend/include manually or with the build.sh script before it can be built with either nix or cargo. This is rather hacky and I'd like a better solution, suggestions appreciated. :)

  2. Library path
    I have yet to come up with a truly elegant way for finding the location of libprobe.so currently there is a flag --lib-path as well as an environment variable __PROBE_LIB that can be set to the directory containing libprobe.so and libprobe-dbg.so. For the purposes of development, the .envrc file in probe_frontend automatically sets __PROBE_LIB to the canonicalized version of ../libprobe/build.

  3. rusage
    I'm using a manually defined version of the rusage struct because the glibc version defines each field as a union over two integral types for kernel/userland compatibility reasons, this turns the (de)serialization traits for rusage from something trivial that can be autogenerated by rust-bindgen to a massive headache that needs lots of unsafe code, so I just defined my own version without the unions in build.rs.

Before Merging

The following things need to be done before this PR is ready to be merged:

  • Documentation
    While much of the code is somewhat repetitive boilerplate, there are parts (such as anywhere there's unsafe code) that should definitely have more documentation.
  • Logging Facility
    This should be pretty simple, but it needs better logging, as well as adding a logging facility to actually display the logs created by the log crate.
  • Tests
    While the rust compiler can guarantee a lot, the addition of some actual unit tests, especially around the unsafe code, are necessary before I'd consider this production ready.
    Large scale unit testing abandoned in favor of end-to-end testing.

In the Future

These are some features I'd like to bring to the rust frontend in the future:

  • ssh support
    Especially with PR ssh wrapper for PROBE #19 in the works, I'd like to add support for remote probe-ing over ssh.

  • Better nix packages
    I'd like to make a wrapper nix package around the base crate package and libprobe itself that automatically sets the library path.

  • Library
    I'd like to break out some of the core functionality of the crate into a library so that other people could use it as a starting point for something like a more specialized frontend.

  • Background Serialization
    Ideally once a process exits it's arena directory could be serialized in the background to reduce the significant delay when the last process exits and serveral tens of thousands of Ops need to be processed.

Requested Feedback

Some of the things I'd specifically like feedback on are:

  • Overall ergonomics
  • The aforementioned build system issue
  • Missing critical features (if any)

@Ex-32 Ex-32 marked this pull request as ready for review June 24, 2024 20:38
probe_src/probe_frontend/build.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/build.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/arena.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/arena.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/arena.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/ffi.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/main.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/metadata.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/ops.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/src/ops.rs Outdated Show resolved Hide resolved
@charmoniumQ
Copy link
Owner

charmoniumQ commented Jun 25, 2024

All in all, great work! I have a few minor comments.

Improved serialization format: completely agree. Gzipped tar of jsonlines is great and cross-platform. Do you want to take over the task of parsing that in Python or should I give it to one of the undergrads?

Dependencies: Yes, static dependencies can be used liberally. We should think about Cargo machete or eventually just for build times.

Multithreading and fast: thanks Rust. We should eventually include a hyperfine microbenchmarks in the test suite. I have a note tracking that task.

Hacky build process: that's a tricky problem. Other than making ops.h its own Nix package that gets included by both, I think the best thing we can do is put the real copy in

Library path: See my note on the code for avoiding a path lookup in favor of hardcoding a location (I think that's actually better in this case), since it is more predictable, and we can strictly control it during package management.

rusage: Fine by me.

Tests, we're working on some "end-to-end tests" (from ./PROBE record <cmd> correctly answer "does data flow from file A to file B?") for libprobe. I think those are sufficient to detect regressions in the wrapper as well as other parts of PROBE. As for localizing the error, I think liberal use of assertions may be easier than writing unittests. Do whichever you think is easier though.

@Ex-32
Copy link
Collaborator Author

Ex-32 commented Jun 25, 2024

Do you want to take over the task of parsing that in Python or should I give it to one of the undergrads?

I can do that, as mentioned above, It might be best to just break off the data-manipulation parts of this as a library and then make python bindings, both for speed and so there's a single source of truth on the json schema for any given version. Thoughts?

@Ex-32
Copy link
Collaborator Author

Ex-32 commented Jun 30, 2024

Version 0.2.0!

  • Now with coconuts! 🥥 (because horses weren't in budget)
  • Anything that can be python eventually will be python trying to be lisp.
  • quasi-quotation in python

The author of this PR thinks all three of the above jokes are funny and doesn't need your validation.

Added features 🦀 🤝 🐍

  • library 📚
    The core transcription part of the codebase has been broken out into an external library that can be consumed independently.

  • python support 🐍
    The exact ergonomics for library consumption still need to be worked out, but this PR now includes python code that can parse probe_logs generated by the probe cli into typed python dataclasses.

    • Currently, this is split into two files, python/ops.py is an auto-generated file that contains typed dataclass definitions generated rust at compilation time.
    • On the other hand, python/probe.py is a short file containing a function for loading probe log into a map-of-maps-of-maps structure, this function won't work if the probe cli isn't in your $PATH, since it uses probe dump --json to extract the ops from the probe_log and dump them as lines of json
  • README 📄
    Added a readme containing a high level overview, the goal being to give people who don't know rust a place to start for building and using the CLI as well as cluing anyone other than me into what the code is actually trying to do. At present it's missing build instructions (I mean atm it's nix develop and then cargo build, but still).

  • proc-macros 🤖
    Most of the structs and their conversion implementations are now being automatically generated by a procedural macro, reducing repetitive code. A proc-macro also generates the dataclass definitions for the python code, meaning almostall struct definitions are generated directly or indirectly from prov_ops.h.

Known Issues 💀

  • proc-macros 🤖
    Due to my inexperience with proc-macros, currently the MakeRustOp and MakePyDataclass derive macros simply panic if they encounter an error, and generally have bad macro hygiene, in future it would be better to actually return compiler errors and rely on less implicit dependencies between the code generated by MakeRustOp and the code manually defined in lib/src/ops.rs.

  • mediocre error types
    If the CLI encounters an error then the errors should be pretty intelligible, but from a library consumer perspective the errors aren't the most ergonomic; especially if/when we have a consumer for the rust library (as apposed to the python bindings) besides this CLI then I'll work on improving these.

  • fragility 💔
    Both proc macros utilize an uncomfortable amount of blind string replacement.

    • rust
      For the rust structs a malformed struct conversion should be caught at compile time either by the proc macro panicking or a compilation failure: for example, if a pointer type isn't converted properly, then the compiler will throw an error when trying to derive the FfiFrom trait because the FfiFrom trait is only implemented by other generated structs and the base cases of *i8 to CString and T: Copy to T
    • python
      Unfortunately, catching a malformed python conversion isn't quite so easy because python doesn't know a conversion is invalid until it tries it. I think adding a CI check that the generated python module loads would be a good idea, as this would catch invalid type conversions. done Even better would be testing that it can successfully parse some known json containing all the ops.
    • Malicious codegen
      Simply don't feed it untrusted code 🙄, it's almost definitely possible to craft a malicious version of prov_ops.h that produces malicious generated code, since this only ever should be fed a trusted file at compile time, I consider this an acceptable risk.

And now for something completely different.

On most modern computers, the time it takes for the light to travel from your display to your eyes is longer than a clock cycle (often several times longer). 🤔

@Ex-32 Ex-32 requested a review from charmoniumQ June 30, 2024 23:08
probe_src/probe_frontend/build.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/README.md Outdated Show resolved Hide resolved
probe_src/probe_frontend/README.md Show resolved Hide resolved
probe_src/probe_frontend/README.md Show resolved Hide resolved
probe_src/probe_frontend/cli/src/record.rs Outdated Show resolved Hide resolved
probe_src/probe_frontend/python/ops.py Outdated Show resolved Hide resolved
probe_src/probe_frontend/python/probe.py Outdated Show resolved Hide resolved
probe_src/probe_frontend/python/ops.py Outdated Show resolved Hide resolved
probe_src/probe_frontend/python/ops.py Outdated Show resolved Hide resolved
probe_src/probe_frontend/python/probe.py Outdated Show resolved Hide resolved
@Ex-32
Copy link
Collaborator Author

Ex-32 commented Jul 3, 2024

Task Tracker

  • added _type annotation
  • Improved pygen process
  • Switch to type alias based enums
  • Strip field prefixes
  • arbitrary property additions
  • improved argument parsing
  • better documentation/README

@Ex-32 Ex-32 requested a review from charmoniumQ July 4, 2024 16:13
Ex-32 and others added 20 commits July 24, 2024 15:12
Update the .envrc for probe_frontend to automatically run make on
libprobe, as well as export the __PROBE_LIB environment variable so that
the frontend can find the libprobe.so
Co-authored-by: Sam Grayson <sam@samgrayson.me>
Removes a lot of details that are usually abstracted by libraries when
printing for humans. In contrast with some of the code review comments I
left mtime and errno field because I feel those are useful for human
readers; even if all a human is likley to do with errno is "was it
zero?"
This commit addresses serveral points raised durring the last
code-review (and a few that weren't):

- Restructured arena.rs to read top-down.
- Refactored child process code to use std::process instead of
subprocess crate.
- Cleaned up arena pointer resolition to use less unsafe code.
- [INFO] Better log messages.
Co-authored-by: Sam Grayson <sam@samgrayson.me>
the structs now get converted to camel case at the rust level instead of
the python level meaning python doesn't have to do any type name
translation. Also strips out redundant field prefixes from the rust
structs (which also solves the issue for python).
@Ex-32
Copy link
Collaborator Author

Ex-32 commented Jul 24, 2024

Okay, that wasn't elegant, next time I need to figure out the more proper way to rebase main.

@charmoniumQ charmoniumQ merged commit b35cf3f into charmoniumQ:main Jul 25, 2024
1 check passed
@Ex-32 Ex-32 deleted the rust-frontend branch July 26, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants