diff --git a/README.md b/README.md index fa48e3b..6b4b991 100644 --- a/README.md +++ b/README.md @@ -13,23 +13,24 @@ [![Run Tests](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml/badge.svg)](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml) - [1. Introduction](#1-introduction) -- [2. Training](#2-training) - - [2.1. Logical Functions](#21-logical-functions) - - [2.1.1. Lithmus Test](#211-lithmus-test) - - [2.2. Hand Written Digits](#22-hand-written-digits) -- [3. Learning](#3-learning) -- [4. Implementation Goals](#4-implementation-goals) - - [4.1. Simple Random Number Generator](#41-simple-random-number-generator) - - [4.2. License](#42-license) - - [4.3. Implementations](#43-implementations) - - [4.3.1. Sample Output](#431-sample-output) -- [5. Reference Implementation](#5-reference-implementation) - - [5.1. Inputs and Randomized Starting Weights](#51-inputs-and-randomized-starting-weights) - - [5.2. Forward Propagation](#52-forward-propagation) - - [5.3. Backpropagation](#53-backpropagation) - - [5.4. Weight Updates](#54-weight-updates) -- [6. Using this in your own solution](#6-using-this-in-your-own-solution) -- [7. References](#7-references) +- [2. Usage](#2-usage) +- [3. Training](#3-training) + - [3.1. Logical Functions](#31-logical-functions) + - [3.1.1. Lithmus Test](#311-lithmus-test) + - [3.2. Hand Written Digits](#32-hand-written-digits) +- [4. Learning](#4-learning) +- [5. Implementation Goals](#5-implementation-goals) + - [5.1. Simple Random Number Generator](#51-simple-random-number-generator) + - [5.2. License](#52-license) + - [5.3. Implementations](#53-implementations) + - [5.3.1. Sample Output](#531-sample-output) +- [6. Reference Implementation](#6-reference-implementation) + - [6.1. Inputs and Randomized Starting Weights](#61-inputs-and-randomized-starting-weights) + - [6.2. Forward Propagation](#62-forward-propagation) + - [6.3. Backpropagation](#63-backpropagation) + - [6.4. Weight Updates](#64-weight-updates) +- [7. Using this in your own solution](#7-using-this-in-your-own-solution) +- [8. References](#8-references) ## 1. Introduction @@ -52,11 +53,61 @@ prefer to focus on the code itself and will happily copy a solution from one programming language to another without worrying about the theoretical background. -## 2. Training +## 2. Usage + +These usage examples are taken directly from our test implementations. The +general flow is to prepare a dataset, create a trainer which contains an empty +neural network, and then train the network until a desired prediction accuracy +is achived. All of these examples output the final predictions to the console. +For any larger dataset you will need to compute the prediction accuracy. One way +to do this is to compute the percentage of correct predictions and the average +"confidence" of the predictions. + +
+Computing prediction score and confidences +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L92-L104 +
+ +
+Rust +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Rust/src/main.rs#L32-L73 +
+ +
+F# +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/FSharp/Program.fs#L38-L66 +
+ +
+C# +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L28-L58 +
+ +
+C++ +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Cpp/main.cpp#L49-L101 +
+ +
+C +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/C/main.c#L46-L87 +
+ +
+Kotlin +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Kotlin/src/Main.kt#L21-L60 +
+ +
+Go +https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Go/main.go#L67-L110 +
+ +## 3. Training For training and verifying our implementations we will use two datasets. -### 2.1. Logical Functions +### 3.1. Logical Functions The first is simple and will be these logical functions: xor, xnor, or, nor, and, and nand. This truth table represents the values that the network will @@ -83,7 +134,7 @@ neurons. Such a network consists of a total of 24 weights: > 💯 We expect each implementation to learn exactly the same network weights! -#### 2.1.1. Lithmus Test +#### 3.1.1. Lithmus Test The logical functions example can be used as a "lithmus test" of neural network implementations. A proper implementation will be able to learn the 6 functions @@ -93,7 +144,7 @@ nodes to learn successfully (if at all). A larger network means more mathematical operations so keep this in mind when you evaluate other implementations. You don't want to waste cpu cycles unnecessarily. -### 2.2. Hand Written Digits +### 3.2. Hand Written Digits The second dataset consists of thousands of hand written digits. This is actually also a "toy" dataset but training a network to recognize all digits @@ -116,7 +167,7 @@ the handwritten digit: Parsing this dataset needs to be implemented for each language. -## 3. Learning +## 4. Learning Our code will perform backpropagation to learn the weights. We update the weights after each input. This is called stochastic learning, as @@ -124,7 +175,7 @@ opposed to batch learning where multiple inputs are presented before updating weights. Stochastic learning is generally preferred [2]. Note that inputs need to be shuffled for effective learning. -## 4. Implementation Goals +## 5. Implementation Goals One of our goals is to have as few or no dependencies. These implementations should be easy to integrate and that requires dependency-free code. Another goal @@ -146,7 +197,7 @@ We strive for: - simple tests that verify our implementations and secure them for the future - having fun exploring neural networks! -### 4.1. Simple Random Number Generator +### 5.1. Simple Random Number Generator Now, a note about random number generation. Training a neural network requires that the initial weights are randomly assigned. We will specify a simple random @@ -192,7 +243,7 @@ The first few random numbers are: > The code samples all contain an extension point where you can plug in your own > implementation, should you wish to do so (or just hardcode your choice!). -### 4.2. License +### 5.2. License All code *in this repository* is licensed under MIT license. This is a **permissive** license and you can use this code in your @@ -206,7 +257,7 @@ then you must also license your implementation with MIT license. > All code in this repo must be licensed under the permissive MIT license. > Please add license header to every source file. No GPL allowed! -### 4.3. Implementations +### 5.3. Implementations This is the current status of the implementations available. We follow a maturity model based on these criteria: @@ -217,7 +268,7 @@ This is the current status of the implementations available. We follow a maturit - Level 4: implement a unit test to verify level 3 and make the code future safe | Language | Level 0 | Level 1 | Level 2 | Level 3 | Level 4 | Contributor | -|-|-|-|-|-|-|-| +|---|:---:|:---:|:---:|:---:|:---:|---| | C# | ⭐️ | ⭐️ | ⭐️ | ⭐️ | ⭐️ | [@dlidstrom](https://github.com/dlidstrom) | | Rust | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) | | F# | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) | @@ -230,7 +281,7 @@ This is the current status of the implementations available. We follow a maturit > Note! The Python implementation is only here as a reference. If you are using Python you already > have access to all ai tools and libraries you need. -#### 4.3.1. Sample Output +#### 5.3.1. Sample Output Digit recognition is done using only 14 hidden neurons, 10 learning epochs (an epoch is a run through the entire dataset), and a learning rate of 0.5. Using @@ -286,7 +337,7 @@ Prediction (output from network for the above input): Looks good, doesn't it? -## 5. Reference Implementation +## 6. Reference Implementation For reference we have [a Python implementation](./Python/Xor.py) which uses NumPy, and should be fairly easy to understand. Why Python? Because Python @@ -302,7 +353,7 @@ values to verify your own calculations. The example is the logical functions shown earlier with the inputs being both `1`, i.e. `1 1`. We will use 3 hidden neurons and 6 outputs (xor, xnor, and, nand, or, nor). -### 5.1. Inputs and Randomized Starting Weights +### 6.1. Inputs and Randomized Starting Weights These are the initial values for the input layer and the hidden layer. $w$ is the weights, $b$ is the biases. Note that we are showing randomized biases here @@ -335,7 +386,7 @@ b_{output} & = & \end{bmatrix} \\ \end{array}$$ -### 5.2. Forward Propagation +### 6.2. Forward Propagation First we show forward propagation for the hidden layer. @@ -396,7 +447,7 @@ y_{output} & = & \begin{bmatrix} \end{bmatrix} \\ \end{array}$$ -### 5.3. Backpropagation +### 6.3. Backpropagation Now we have calculated output. These are off according to the expected output and the purpose of the next step, backpropagation, is to correct the weights for @@ -466,7 +517,7 @@ $$\begin{array}{rcl} \end{bmatrix} \end{array}$$ -### 5.4. Weight Updates +### 6.4. Weight Updates Finally we can apply weight updates. $\alpha$ is the learning rate which here will be $1$. First update weights and biases for the output layer. @@ -501,7 +552,7 @@ b_{hidden} & = & \end{bmatrix} \end{array}$$ -## 6. Using this in your own solution +## 7. Using this in your own solution If you do use any of these implementations in your own solution, then here are some things to keep in mind for good results: @@ -513,7 +564,7 @@ are some things to keep in mind for good results: - you may try "annealing" the learning rate, meaning start high (0.5) and slowly decrease over the epochs -## 7. References +## 8. References [1]
[2]