Skip to content

Commit

Permalink
Restyle exporter CLI and container image export (#66)
Browse files Browse the repository at this point in the history
  • Loading branch information
charmoniumQ authored Oct 17, 2024
1 parent f06f5b9 commit a7140a2
Show file tree
Hide file tree
Showing 20 changed files with 721 additions and 388 deletions.
25 changes: 25 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
all:
mkdir -p experiments

# Process 14194128: bash -c cat flake.nix > test0; head test0 >tmp ; wc -l <tmp
mkdir -p process_14194128
# Run command for process 14194128
(bash -c cat flake.nix > test0; head test0 >tmp ; wc -l <tmp) > process_14194128/output.log 2>&1
# Process 14194128: cat flake.nix
mkdir -p process_14194128
# Copy input files for process 14194128
cp flake.nix_v0 process_14194128/
# Run command for process 14194128
(cd process_14194128 && cat flake.nix)
# Process 14194128: head test0
mkdir -p process_14194128
# Copy input files for process 14194128
cp test0_v0 process_14194128/
# Run command for process 14194128
(cd process_14194128 && head test0)
# Process 14194128: wc -l
mkdir -p process_14194128
# Copy input files for process 14194128
cp tmp_v0 process_14194128/
# Run command for process 14194128
(wc -l) > process_14194128/output.log 2>&1
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ If you need these you can either write a shell script and invoke `probe record`
probe record bash -c '<SHELL_CODE>'
```

(any flag after the first positional argument is treated as an argument to the command, not `probe`).
Any flag after the first positional argument is treated as an argument to the command, not `probe`.

This creates a file called `probe_log`. If you already have that file from a previous recording, give `probe record -f` to overwrite.

If you get tired of typing `probe record ...` in front of every command you wish to record, consider recording your entire shell session:

Expand All @@ -84,12 +86,11 @@ $ probe dump

That's a huge [work in progress](https://github.com/charmoniumQ/PROBE/pulls).
We're starting out with just "analysis" of the provenance. Does this input file influence that output file in the PROBEd process? Run
Try exporting to different formats.
``` bash
nix shell nixpkgs#graphviz github:charmoniumQ/PROBE#probe-py-manual \
--command sh -c 'python -m probe_py.manual.cli process-graph | tee /dev/stderr | dot -Tpng -ooutput.png /dev/stdin'
probe export --help
```
## Developing PROBE
Expand Down
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
.auctex-auto/
photo-1.webp
photo-2.jpg
Binary file added docs/dataflow-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dataflow-graph.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 22 additions & 19 deletions docs/us-rse.html
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ <h1 style="font-weight: normal; font-size: 10vh;">PROBE4RSE: Provenance Replay/O
style="height: 100%: width: auto; margin-top: auto; margin-bottom: auto; vertical-align: middle;"
/>
<span>
Sandia National Laboratories
Sandia National Labs
</span>
</section>
<section>
Expand Down Expand Up @@ -211,7 +211,7 @@ <h2>Prior works in record/replay</h2>
Sciunit (<a href="https://doi.org/10.1109/eScience.2017.51">Ton That et al. 2017</a>),
RR (<a href="http://arxiv.org/abs/1705.05937">O'Callahan et al. 2017</a>),
CARE (<a href="https://dl.acm.org/doi/10.1145/2618137.2618138">Yves et al. 2014</a>),
ReproZip (<a href="https://dl.acm.org/doi/10.1145/2882903.2899401">Chirigati et al. 2016</a>),
ReproZip (<a href="https://dl.acm.org/doi/10.1145/2882903.2899401">Chirigati et al. 2016</a>)
</li>
<li>Speed</li>
<li>Robustness of reproducibility</li>
Expand All @@ -236,13 +236,29 @@ <h2>Understand dataflow in your pile of scripts</h2>
<figure class="contributor">
<img
alt="Shofiya Bootwala"
src="https://media.discordapp.net/attachments/1251250007546531910/1294579613975969852/headshot.jpeg?ex=6710ccb8&is=670f7b38&hm=daaecbf0e84e301296da60673f04003c4522e794681ceae2527309d4663cd402&=&format=webp&width=543&height=816"
src="./photo-1.webp"
/>
<figcaption>Contributed by Shofiya Bootwala (new grad applying for PhD)</figcaption>
</figure>
</section>
<section>
<h2>Create Makefile automatically</h2>
<img alt="example dataflow graph" src="dataflow-graph.png" />
</section>
<section>
<h2>Create container automatically</h2>
<pre style="font-size: 5vh;"><code><span class="fragment">$ probe record ./plot.sh</span>
<span class="fragment">$ probe export docker-image experiment:1.0.1</span>
<span class="fragment">$ docker run experiment:1.0.1</span></code></pre>
<figure class="contributor">
<img
alt="Asif Zubayer Palak"
src="https://scholar.googleusercontent.com/citations?view_op=medium_photo&user=lwLSWCgAAAAJ&citpid=1"
/>
<figcaption>Contributed by Asif Zubayer Palak (new grad applying for PhD)</figcaption>
</figure>
</section>
<section>
<h2>Create Makefile automatically (planned feature)</h2>
<pre style="font-size: 5vh;"><code><span class="fragment">$ probe record ./plot.sh 42</span>
<span class="fragment">$ probe export makefile</span>
<span class="fragment">$ cat Makefile
Expand All @@ -259,19 +275,6 @@ <h2>Create Makefile automatically</h2>
<figcaption>Contributed by Kyrillos Ishak (new grad applying for PhD)</figcaption>
</figure>
</section>
<section>
<h2>Create container automatically (Planned feature)</h2>
<pre style="font-size: 5vh;"><code><span class="fragment">$ probe record ./plot.sh</span>
<span class="fragment">$ probe export docker-image experiment:1.0.1</span>
<span class="fragment">$ docker run experiment:1.0.1</span></code></pre>
<figure class="contributor">
<img
alt="Asif Zubayer Palak"
src="https://scholar.googleusercontent.com/citations?view_op=medium_photo&user=lwLSWCgAAAAJ&citpid=1"
/>
<figcaption>Contributed by Asif Zubayer Palak (new grad applying for PhD)</figcaption>
</figure>
</section>
<section>
<h2>What libraries does cmd use? (Planned feature)</h2>
<pre style="font-size: 5vh;"><code><span class="fragment">$ probe record ./plot.sh</span>
Expand All @@ -291,12 +294,12 @@ <h2>Performance and portability</h2>
<!-- <li>No standard benchmarks for system-level prov before <a href="https://dl.acm.org/doi/abs/10.1145/3641525.3663627">Grayson et al. 2023 ACM REP</a></li> -->
<li class="fragment">Rust record (statically-linked) + Python extras</li>
<!-- <li class="fragment">Less than 2x overhead on provenance benchmark suite (<a href="https://dl.acm.org/doi/abs/10.1145/3589806.3600037">Grayson et al. 2023 ACM REP</a>)</li> -->
<li class="fragment">Preliminary results show <code>LD_PRELOAD</code> beats <code>ptrace</code> by 2x to 1.1x</li>
<li class="fragment">Preliminary results show <code>LD_PRELOAD</code> (1.1x) faster than <code>ptrace</code> (2x)</li>
</ul>
<figure class="contributor">
<img
src="./photo-2.jpg"
alt="Jenna Fligor"
src="https://dl.boxcloud.com/api/2.0/internal_files/1674528466908/versions/1842473744508/representations/jpg_paged_2048x2048/content/1.jpg?access_token=1!Zv5ziYYpwnrMJ6K36f-iNXE96dMCXuwyxiVAoEDq-rgjHKsptWAD9yX2uRjddf26A4MYK881SS1iExSMyom8XsLxZdmCQQUoSb0TKEA39P-zHgsYeO7QxBO3IoFch6HaWfl3N3dky84buK2dTCQkSNdqluhCAPzdacTDHAhe7AZ-PWaoiM1Lez0ePVUawGJX9bYAsJB9dug8EJn1PpTd2xXCQ06XbsH7-VAyiyhavH84IK_InKcyUj1ID-0LGAtfsIhtnS11Wo68nV7aalz2Kgld_he1BTcrzJRIEWUNytsKdPn-cMB84xF--t6HNVolDyG7VNmi1EDDFQCJBeKvGtJ_vvB-2cnEbZ8G93LsS6tX3Y-kLLifrUa76eKDRL1_o4ngZj4lVf55u8I8KAWVOU2t1Q5ycEd21KtcKjHVWCtGq9Yqd-bvVEdrRsTN-ZwgAb2AwkrlPAH3RTf7jrPxAYg3cc3__-SlDFuthU5w3LRhfepPXH8WKAA51jFaFWhfB2f2CfNsfpaazu2EOIp3uXsKHmueqe5SNvN9P2O_ToYNdJqNz4yxkZr-Fes1l1cB52s9w5wdQqia9deQFrdGj5c8qcgUGIo0xURXTar2S0hVe4wHSEP_9HboJWXg0IvNvLOjiMTUHRoG__7nteF6tngwb5oCdjrzhyXjsY13getSZPQQNhp84QT6Dw2zgxUr41cLl-JuxDQZBNXPX4ZFTOK0N-M.&amp;box_client_name=box-content-preview&amp;box_client_version=2.110.0"
/>
<figcaption>Python → Rust by Jenna Fligor (ugrad applying for internships)</figcaption>
</figure>
Expand Down
16 changes: 12 additions & 4 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,8 @@
${frontend.packages.probe-cli}/bin/probe \
$out/bin/probe \
--set __PROBE_LIB ${libprobe}/lib \
--prefix PATH : ${probe-py}/bin
--prefix PATH : ${probe-py}/bin \
--prefix PATH : ${pkgs.buildah}/bin
'';
};
probe-py-generated = frontend.packages.probe-py-generated;
Expand Down Expand Up @@ -149,15 +150,21 @@
src = ./.;
doCheck = true;
nativeBuildInputs = [pkgs.alejandra];
buildPhase = "touch $out";
installPhase = "mkdir $out";
buildPhase = "true";
checkPhase = ''
alejandra --check .
'';
};
probe-integration-tests = pkgs.stdenv.mkDerivation {
name = "probe-integration-tests";
src = ./probe_src/tests;
nativeBuildInputs = [packages.probe-bundled packages.probe-py];
nativeBuildInputs = [
packages.probe-bundled
packages.probe-py
pkgs.podman
pkgs.docker
];
buildPhase = "touch $out";
checkPhase = ''
pytest .
Expand All @@ -182,7 +189,6 @@
pkgs.cargo-expand
pkgs.cargo-flamegraph
pkgs.cargo-watch
pkgs.gdb
pkgs.rust-analyzer

(python.withPackages (pypkgs: [
Expand All @@ -205,6 +211,7 @@

# (export-and-rename python312-debug [["bin/python" "bin/python-dbg"]])

pkgs.buildah
pkgs.which
pkgs.gnumake
pkgs.gcc
Expand All @@ -216,6 +223,7 @@
pkgs.ruff
pkgs.cachix
pkgs.jq # to make cachix work
pkgs.podman
]
# gdb broken on i686
++ pkgs.lib.lists.optional (system != "i686-linux") pkgs.nextflow
Expand Down
7 changes: 7 additions & 0 deletions lightweight_env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash

# nix develop brings in a ton of stuff to the env
# which complicates testing probe
# To simplify, use this script.

env - __PROBE_LIB=$__PROBE_LIB PATH=$PATH PYTHONPATH=$PYTHONPATH $@
2 changes: 2 additions & 0 deletions probe_src/frontend/cli/src/dump.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ use serde::{Deserialize, Serialize};
///
/// This hides some of the data and so is not suitable for machine consumption use
/// [`to_stdout_json()`] instead.
#[allow(dead_code)]
pub fn to_stdout<P: AsRef<Path>>(tar_path: P) -> Result<()> {
dump_internal(tar_path, |(pid, epoch, tid), ops| {
let mut stdout = std::io::stdout().lock();
Expand All @@ -33,6 +34,7 @@ pub fn to_stdout<P: AsRef<Path>>(tar_path: P) -> Result<()> {
/// ```
///
/// (without whitespace)
#[allow(dead_code)]
pub fn to_stdout_json<P: AsRef<Path>>(tar_path: P) -> Result<()> {
dump_internal(tar_path, |(pid, epoch, tid), ops| {
let mut stdout = std::io::stdout().lock();
Expand Down
27 changes: 3 additions & 24 deletions probe_src/frontend/cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,19 +67,9 @@ fn main() -> Result<()> {
.value_parser(value_parser!(OsString)),
])
.about("Convert PROBE records to PROBE logs."),
// Command::new("dump")
// .args([
// arg!(--json "Output JSON.")
// .required(false)
// .value_parser(value_parser!(bool)),
// arg!(-i --input <PATH> "Path to load PROBE log from.")
// .required(false)
// .default_value("probe_log")
// .value_parser(value_parser!(OsString)),
// ])
// .about("Write the data from probe log data in a human-readable manner"),
// TODO: Dump is temporarily broken by https://github.com/charmoniumQ/PROBE/pull/60.
// For now, we can just use tar xvf or analysis.generated.parse_prov_log(...) instead
/* No more probe dump in Rust.
* See `probe export debug-text` in Python.
* */
Command::new("__gdb-exec-shim").hide(true).arg(
arg!(<CMD> ... "Command to run")
.required(true)
Expand Down Expand Up @@ -127,17 +117,6 @@ fn main() -> Result<()> {
.and_then(|mut tar| transcribe::transcribe(input, &mut tar))
.wrap_err("Transcribe command failed")
}
Some(("dump", sub)) => {
let json = sub.get_flag("json");
let input = sub.get_one::<OsString>("input").unwrap().clone();

if json {
dump::to_stdout_json(input)
} else {
dump::to_stdout(input)
}
.wrap_err("Dump command failed")
}
Some(("__gdb-exec-shim", sub)) => {
let cmd = sub
.get_many::<OsString>("CMD")
Expand Down
113 changes: 57 additions & 56 deletions probe_src/frontend/python/probe_py/generated/parser.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
from __future__ import annotations
import os
import contextlib
import tempfile
import pathlib
import typing
import json
import tarfile
from dataclasses import dataclass
from dataclasses import dataclass, replace
from . import ops

@dataclass(frozen=True)
Expand Down Expand Up @@ -31,68 +35,65 @@ class InodeVersionLog:
tv_nsec: int
size: int

@staticmethod
def from_path(path: pathlib.Path) -> InodeVersionLog:
s = path.stat()
return InodeVersionLog(
os.major(s.st_dev),
os.minor(s.st_dev),
s.st_ino,
s.st_mtime_ns // int(1e9),
s.st_mtime_ns % int(1e9),
s.st_size,
)


@dataclass(frozen=True)
class ProvLog:
processes: typing.Mapping[int, ProcessProvLog]
inodes: typing.Mapping[InodeVersionLog, str]
inodes: typing.Mapping[InodeVersionLog, pathlib.Path]
has_inodes: bool

def parse_probe_log(probe_log: pathlib.Path) -> ProvLog:
op_map = dict[int, dict[int, dict[int, ThreadProvLog]]]()
inodes = dict[InodeVersionLog, str]()
has_inodes = False

tar = tarfile.open(probe_log, mode='r')

for item in tar:
# items with size zero are directories in the tarball
if item.size == 0:
continue

# extract and name the hierarchy components
parts = item.name.split("/")
if parts[0] == "info":
if parts[1] == "copy_files":
has_inodes = True
elif parts[0] == "inodes":
if len(parts) != 2:
raise RuntimeError("Invalid probe_log")
inodes[InodeVersionLog(*[
@contextlib.contextmanager
def parse_probe_log_ctx(
probe_log: pathlib.Path,
) -> typing.Iterator[ProvLog]:
"""Parse probe log; return provenance data and inode contents"""
with tempfile.TemporaryDirectory() as _tmpdir:
tmpdir = pathlib.Path(_tmpdir)
with tarfile.open(probe_log, mode="r") as tar:
tar.extractall(tmpdir, filter="data")
has_inodes = (tmpdir / "info" / "copy_files").exists()
inodes = {
InodeVersionLog(*[
int(segment, 16)
for segment in parts[1].split("-")
])] = item.name
elif parts[0] == "pids":
if len(parts) != 4:
raise RuntimeError("Invalid probe_log")
pid: int = int(parts[1])
epoch: int = int(parts[2])
tid: int = int(parts[3])

# extract file contents as byte buffer
file = tar.extractfile(item)
if file is None:
raise IOError("Unable to read jsonlines from probe log")

# read, split, comprehend, deserialize, extend
jsonlines = file.read().strip().split(b"\n")
ops = ThreadProvLog(tid, [json.loads(x, object_hook=op_hook) for x in jsonlines])
op_map.setdefault(pid, {}).setdefault(epoch, {})[tid] = ops

return ProvLog(
processes={
pid: ProcessProvLog(
pid,
{
epoch: ExecEpochProvLog(epoch, threads)
for epoch, threads in epochs.items()
},
)
for pid, epochs in op_map.items()
},
inodes=inodes,
has_inodes=has_inodes,
)
for segment in file.name.split("-")
]): file
for file in (tmpdir / "inodes").iterdir()
} if (tmpdir / "inodes").exists() else {}

processes = {}
for pid_dir in (tmpdir / "pids").iterdir():
pid = int(pid_dir.name)
epochs = {}
for epoch_dir in pid_dir.iterdir():
epoch = int(epoch_dir.name)
tids = {}
for tid_file in epoch_dir.iterdir():
tid = int(tid_file.name)
# read, split, comprehend, deserialize, extend
jsonlines = tid_file.read_text().strip().split("\n")
tids[tid] = ThreadProvLog(tid, [json.loads(x, object_hook=op_hook) for x in jsonlines])
epochs[epoch] = ExecEpochProvLog(epoch, tids)
processes[pid] = ProcessProvLog(pid, epochs)
yield ProvLog(processes, inodes, has_inodes)

def parse_probe_log(
probe_log: pathlib.Path,
) -> ProvLog:
"""Parse probe log; return provenance data, but throw away inode contents"""
with parse_probe_log_ctx(probe_log) as prov_log:
return replace(prov_log, has_inodes=False, inodes={})

def op_hook(json_map: typing.Dict[str, typing.Any]) -> typing.Any:
ty: str = json_map["_type"]
Expand Down
Loading

0 comments on commit a7140a2

Please sign in to comment.