diff --git a/README.md b/README.md index 845da59..7085bb0 100644 --- a/README.md +++ b/README.md @@ -138,19 +138,38 @@ from the project's root. ## Rough Time Measurements We compare the Rormula to the well-established and way more mature package [Formulaic](https://github.com/matthewwardrop/formulaic). -The [tests](test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is -``` -Rormula took 0.0040s -Formulaic took 0.7854s -``` -We have separated categorical and numerical data beforehand. If we let rormula do the separation and pass a Pandas dataframe, we obtain -``` -Rormula took 0.0487s -Formulaic took 0.7699s +The [tests](rormula/test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is +``` +- test just numerical +Rormula took 0.0020s +Rormula asdf took 0.0247s +Formulaic took 0.2037s +- test numerical and categorical +Rormula took 0.0045s +Rormula asdf took 0.0300s +Formulaic took 0.3403s +``` +For the first and forth lines that start with `Rormula took`, we have separated categorical and numerical data beforehand. +For the result in the second and fifth lines that start with `Rormula asdf took`, we pass and receive pandas dataframes. +The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better. + +## Profiling +We use [Counts](https://github.com/nnethercote/counts/) for profiling Rust code. + +To run profiling one can use ``` -Rormula returns a list of column names and the data as Numpy array. If we want a Pandas dataframe as result we obtain +maturin develop --release --features print_timings +python test/test_wilkinson.py 2> counts.txt +counts -i -e counts.txt ``` -Rormula took 0.0744s -Formulaic took 0.7639s +To profile other specific parts of the Rust-code add +```rust +#[cfg(feature = "print_timings")] +let now = std::time::Instant::now(); + +// code snippet to be profiled + +#[cfg(feature = "print_timings")] +eprintln!("name of code snippet {}", now.elapsed().as_nanos()); ``` -The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better. +Note that running in profiling mode makes the whole program slower and the time measurements of the section above will not hold anymore. diff --git a/rormula/Cargo.toml b/rormula/Cargo.toml index 4d24f6e..209d546 100644 --- a/rormula/Cargo.toml +++ b/rormula/Cargo.toml @@ -18,6 +18,7 @@ rormula-rs = { path = "../rormula-rs" } [features] extension-module = ["pyo3/extension-module"] default = ["extension-module"] +print_timings = ["rormula-rs/print_timings"] [dev-dependencies] criterion = { version = "0.5.1", features = ["html_reports"] } diff --git a/rormula/test/test_wilkinson.py b/rormula/test/test_wilkinson.py index da4a422..cef784f 100644 --- a/rormula/test/test_wilkinson.py +++ b/rormula/test/test_wilkinson.py @@ -189,5 +189,7 @@ def test_separated(): if __name__ == "__main__": + print("- test just numerical") test_numerical() + print("- test numerical and categorical") test_num_cat()