diff --git a/README.md b/README.md
index 845da59..7085bb0 100644
--- a/README.md
+++ b/README.md
@@ -138,19 +138,38 @@ from the project's root.
 
 ## Rough Time Measurements
 We compare the Rormula to the well-established and way more mature package [Formulaic](https://github.com/matthewwardrop/formulaic).
-The [tests](test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is 
-```
-Rormula took 0.0040s
-Formulaic took 0.7854s
-```
-We have separated categorical and numerical data beforehand. If we let rormula do the separation and pass a Pandas dataframe, we obtain
-```
-Rormula took 0.0487s
-Formulaic took 0.7699s
+The [tests](rormula/test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is 
+```
+- test just numerical
+Rormula took 0.0020s
+Rormula asdf took 0.0247s
+Formulaic took 0.2037s
+- test numerical and categorical
+Rormula took 0.0045s
+Rormula asdf took 0.0300s
+Formulaic took 0.3403s
+```
+For the first and forth lines that start with `Rormula took`, we have separated categorical and numerical data beforehand. 
+For the result in the second and fifth lines that start with `Rormula asdf took`, we pass and receive pandas dataframes.
+The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better.
+
+## Profiling
+We use [Counts](https://github.com/nnethercote/counts/) for profiling Rust code.
+
+To run profiling one can use
 ```
-Rormula returns a list of column names and the data as Numpy array. If we want a Pandas dataframe as result we obtain
+maturin develop --release --features print_timings
+python test/test_wilkinson.py 2> counts.txt
+counts -i -e counts.txt
 ```
-Rormula took 0.0744s
-Formulaic took 0.7639s
+To profile other specific parts of the Rust-code add
+```rust
+#[cfg(feature = "print_timings")]
+let now = std::time::Instant::now();
+
+// code snippet to be profiled
+
+#[cfg(feature = "print_timings")]
+eprintln!("name of code snippet {}", now.elapsed().as_nanos());
 ```
-The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better.
+Note that running in profiling mode makes the whole program slower and the time measurements of the section above will not hold anymore.
diff --git a/rormula/Cargo.toml b/rormula/Cargo.toml
index 4d24f6e..209d546 100644
--- a/rormula/Cargo.toml
+++ b/rormula/Cargo.toml
@@ -18,6 +18,7 @@ rormula-rs = { path = "../rormula-rs" }
 [features]
 extension-module = ["pyo3/extension-module"]
 default = ["extension-module"]
+print_timings = ["rormula-rs/print_timings"]
 
 [dev-dependencies]
 criterion = { version = "0.5.1", features = ["html_reports"] }
diff --git a/rormula/test/test_wilkinson.py b/rormula/test/test_wilkinson.py
index da4a422..cef784f 100644
--- a/rormula/test/test_wilkinson.py
+++ b/rormula/test/test_wilkinson.py
@@ -189,5 +189,7 @@ def test_separated():
 
 
 if __name__ == "__main__":
+    print("- test just numerical")
     test_numerical()
+    print("- test numerical and categorical")
     test_num_cat()