diff --git a/README.md b/README.md
index fa48e3b..6b4b991 100644
--- a/README.md
+++ b/README.md
@@ -13,23 +13,24 @@
 [![Run Tests](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml/badge.svg)](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml)
 
 - [1. Introduction](#1-introduction)
-- [2. Training](#2-training)
-  - [2.1. Logical Functions](#21-logical-functions)
-    - [2.1.1. Lithmus Test](#211-lithmus-test)
-  - [2.2. Hand Written Digits](#22-hand-written-digits)
-- [3. Learning](#3-learning)
-- [4. Implementation Goals](#4-implementation-goals)
-  - [4.1. Simple Random Number Generator](#41-simple-random-number-generator)
-  - [4.2. License](#42-license)
-  - [4.3. Implementations](#43-implementations)
-    - [4.3.1. Sample Output](#431-sample-output)
-- [5. Reference Implementation](#5-reference-implementation)
-  - [5.1. Inputs and Randomized Starting Weights](#51-inputs-and-randomized-starting-weights)
-  - [5.2. Forward Propagation](#52-forward-propagation)
-  - [5.3. Backpropagation](#53-backpropagation)
-  - [5.4. Weight Updates](#54-weight-updates)
-- [6. Using this in your own solution](#6-using-this-in-your-own-solution)
-- [7. References](#7-references)
+- [2. Usage](#2-usage)
+- [3. Training](#3-training)
+  - [3.1. Logical Functions](#31-logical-functions)
+    - [3.1.1. Lithmus Test](#311-lithmus-test)
+  - [3.2. Hand Written Digits](#32-hand-written-digits)
+- [4. Learning](#4-learning)
+- [5. Implementation Goals](#5-implementation-goals)
+  - [5.1. Simple Random Number Generator](#51-simple-random-number-generator)
+  - [5.2. License](#52-license)
+  - [5.3. Implementations](#53-implementations)
+    - [5.3.1. Sample Output](#531-sample-output)
+- [6. Reference Implementation](#6-reference-implementation)
+  - [6.1. Inputs and Randomized Starting Weights](#61-inputs-and-randomized-starting-weights)
+  - [6.2. Forward Propagation](#62-forward-propagation)
+  - [6.3. Backpropagation](#63-backpropagation)
+  - [6.4. Weight Updates](#64-weight-updates)
+- [7. Using this in your own solution](#7-using-this-in-your-own-solution)
+- [8. References](#8-references)
 
 ## 1. Introduction
 
@@ -52,11 +53,61 @@ prefer to focus on the code itself and will happily copy a solution from one
 programming language to another without worrying about the theoretical
 background.
 
-## 2. Training
+## 2. Usage
+
+These usage examples are taken directly from our test implementations. The
+general flow is to prepare a dataset, create a trainer which contains an empty
+neural network, and then train the network until a desired prediction accuracy
+is achived. All of these examples output the final predictions to the console.
+For any larger dataset you will need to compute the prediction accuracy. One way
+to do this is to compute the percentage of correct predictions and the average
+"confidence" of the predictions.
+
+<details>
+<summary>Computing prediction score and confidences</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L92-L104
+</details>
+
+<details>
+<summary>Rust</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Rust/src/main.rs#L32-L73
+</details>
+
+<details>
+<summary>F#</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/FSharp/Program.fs#L38-L66
+</details>
+
+<details>
+<summary>C#</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L28-L58
+</details>
+
+<details>
+<summary>C++</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Cpp/main.cpp#L49-L101
+</details>
+
+<details>
+<summary>C</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/C/main.c#L46-L87
+</details>
+
+<details>
+<summary>Kotlin</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Kotlin/src/Main.kt#L21-L60
+</details>
+
+<details>
+<summary>Go</summary>
+https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Go/main.go#L67-L110
+</details>
+
+## 3. Training
 
 For training and verifying our implementations we will use two datasets.
 
-### 2.1. Logical Functions
+### 3.1. Logical Functions
 
 The first is simple and will be these logical functions: xor, xnor, or, nor,
 and, and nand. This truth table represents the values that the network will
@@ -83,7 +134,7 @@ neurons. Such a network consists of a total of 24 weights:
 
 > 💯 We expect each implementation to learn exactly the same network weights!
 
-#### 2.1.1. Lithmus Test
+#### 3.1.1. Lithmus Test
 
 The logical functions example can be used as a "lithmus test" of neural network
 implementations. A proper implementation will be able to learn the 6 functions
@@ -93,7 +144,7 @@ nodes to learn successfully (if at all). A larger network means more
 mathematical operations so keep this in mind when you evaluate other
 implementations. You don't want to waste cpu cycles unnecessarily.
 
-### 2.2. Hand Written Digits
+### 3.2. Hand Written Digits
 
 The second dataset consists of thousands of hand written digits. This is
 actually also a "toy" dataset but training a network to recognize all digits
@@ -116,7 +167,7 @@ the handwritten digit:
 
 Parsing this dataset needs to be implemented for each language.
 
-## 3. Learning
+## 4. Learning
 
 Our code will perform backpropagation to learn the weights. We update
 the weights after each input. This is called stochastic learning, as
@@ -124,7 +175,7 @@ opposed to batch learning where multiple inputs are presented before
 updating weights. Stochastic learning is generally preferred [2]. Note
 that inputs need to be shuffled for effective learning.
 
-## 4. Implementation Goals
+## 5. Implementation Goals
 
 One of our goals is to have as few or no dependencies. These implementations
 should be easy to integrate and that requires dependency-free code. Another goal
@@ -146,7 +197,7 @@ We strive for:
 - simple tests that verify our implementations and secure them for the future
 - having fun exploring neural networks!
 
-### 4.1. Simple Random Number Generator
+### 5.1. Simple Random Number Generator
 
 Now, a note about random number generation. Training a neural network requires
 that the initial weights are randomly assigned. We will specify a simple random
@@ -192,7 +243,7 @@ The first few random numbers are:
 > The code samples all contain an extension point where you can plug in your own
 > implementation, should you wish to do so (or just hardcode your choice!).
 
-### 4.2. License
+### 5.2. License
 
 All code *in this repository* is licensed under MIT license.
 This is a **permissive** license and you can use this code in your
@@ -206,7 +257,7 @@ then you must also license your implementation with MIT license.
 > All code in this repo must be licensed under the permissive MIT license.
 > Please add license header to every source file. No GPL allowed!
 
-### 4.3. Implementations
+### 5.3. Implementations
 
 This is the current status of the implementations available. We follow a maturity model based on these criteria:
 
@@ -217,7 +268,7 @@ This is the current status of the implementations available. We follow a maturit
 - Level 4: implement a unit test to verify level 3 and make the code future safe
 
 | Language | Level 0 | Level 1 | Level 2 | Level 3 | Level 4 | Contributor |
-|-|-|-|-|-|-|-|
+|---|:---:|:---:|:---:|:---:|:---:|---|
 | C# | ⭐️ | ⭐️ | ⭐️ | ⭐️ | ⭐️ | [@dlidstrom](https://github.com/dlidstrom) |
 | Rust | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) |
 | F# | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) |
@@ -230,7 +281,7 @@ This is the current status of the implementations available. We follow a maturit
 > Note! The Python implementation is only here as a reference. If you are using Python you already
 > have access to all ai tools and libraries you need.
 
-#### 4.3.1. Sample Output
+#### 5.3.1. Sample Output
 
 Digit recognition is done using only 14 hidden neurons, 10 learning epochs (an
 epoch is a run through the entire dataset), and a learning rate of 0.5. Using
@@ -286,7 +337,7 @@ Prediction (output from network for the above input):
 
 Looks good, doesn't it?
 
-## 5. Reference Implementation
+## 6. Reference Implementation
 
 For reference we have [a Python implementation](./Python/Xor.py) which uses NumPy,
 and should be fairly easy to understand. Why Python? Because Python
@@ -302,7 +353,7 @@ values to verify your own calculations. The example is the logical functions
 shown earlier with the inputs being both `1`, i.e. `1 1`. We will use 3 hidden
 neurons and 6 outputs (xor, xnor, and, nand, or, nor).
 
-### 5.1. Inputs and Randomized Starting Weights
+### 6.1. Inputs and Randomized Starting Weights
 
 These are the initial values for the input layer and the hidden layer. $w$ is
 the weights, $b$ is the biases. Note that we are showing randomized biases here
@@ -335,7 +386,7 @@ b_{output} & = &
 \end{bmatrix} \\
 \end{array}$$
 
-### 5.2. Forward Propagation
+### 6.2. Forward Propagation
 
 First we show forward propagation for the hidden layer.
 
@@ -396,7 +447,7 @@ y_{output} & = & \begin{bmatrix}
 \end{bmatrix} \\
 \end{array}$$
 
-### 5.3. Backpropagation
+### 6.3. Backpropagation
 
 Now we have calculated output. These are off according to the expected output
 and the purpose of the next step, backpropagation, is to correct the weights for
@@ -466,7 +517,7 @@ $$\begin{array}{rcl}
 \end{bmatrix}
 \end{array}$$
 
-### 5.4. Weight Updates
+### 6.4. Weight Updates
 
 Finally we can apply weight updates. $\alpha$ is the learning rate which here
 will be $1$. First update weights and biases for the output layer.
@@ -501,7 +552,7 @@ b_{hidden} & = &
 \end{bmatrix}
 \end{array}$$
 
-## 6. Using this in your own solution
+## 7. Using this in your own solution
 
 If you do use any of these implementations in your own solution, then here
 are some things to keep in mind for good results:
@@ -513,7 +564,7 @@ are some things to keep in mind for good results:
 - you may try "annealing" the learning rate, meaning start high (0.5) and slowly
   decrease over the epochs
 
-## 7. References
+## 8. References
 
 [1] <http://neuralnetworksanddeeplearning.com/> <br>
 [2] <https://leon.bottou.org/publications/pdf/tricks-1998.pdf> <br>