Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage #14

Merged
merged 5 commits into from
Dec 13, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 86 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,24 @@
[![Run Tests](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml/badge.svg)](https://github.com/dlidstrom/NeuralNetworkInAllLangs/actions/workflows/ci.yaml)

- [1. Introduction](#1-introduction)
- [2. Training](#2-training)
- [2.1. Logical Functions](#21-logical-functions)
- [2.1.1. Lithmus Test](#211-lithmus-test)
- [2.2. Hand Written Digits](#22-hand-written-digits)
- [3. Learning](#3-learning)
- [4. Implementation Goals](#4-implementation-goals)
- [4.1. Simple Random Number Generator](#41-simple-random-number-generator)
- [4.2. License](#42-license)
- [4.3. Implementations](#43-implementations)
- [4.3.1. Sample Output](#431-sample-output)
- [5. Reference Implementation](#5-reference-implementation)
- [5.1. Inputs and Randomized Starting Weights](#51-inputs-and-randomized-starting-weights)
- [5.2. Forward Propagation](#52-forward-propagation)
- [5.3. Backpropagation](#53-backpropagation)
- [5.4. Weight Updates](#54-weight-updates)
- [6. Using this in your own solution](#6-using-this-in-your-own-solution)
- [7. References](#7-references)
- [2. Usage](#2-usage)
- [3. Training](#3-training)
- [3.1. Logical Functions](#31-logical-functions)
- [3.1.1. Lithmus Test](#311-lithmus-test)
- [3.2. Hand Written Digits](#32-hand-written-digits)
- [4. Learning](#4-learning)
- [5. Implementation Goals](#5-implementation-goals)
- [5.1. Simple Random Number Generator](#51-simple-random-number-generator)
- [5.2. License](#52-license)
- [5.3. Implementations](#53-implementations)
- [5.3.1. Sample Output](#531-sample-output)
- [6. Reference Implementation](#6-reference-implementation)
- [6.1. Inputs and Randomized Starting Weights](#61-inputs-and-randomized-starting-weights)
- [6.2. Forward Propagation](#62-forward-propagation)
- [6.3. Backpropagation](#63-backpropagation)
- [6.4. Weight Updates](#64-weight-updates)
- [7. Using this in your own solution](#7-using-this-in-your-own-solution)
- [8. References](#8-references)

## 1. Introduction

Expand All @@ -52,11 +53,61 @@ prefer to focus on the code itself and will happily copy a solution from one
programming language to another without worrying about the theoretical
background.

## 2. Training
## 2. Usage

These usage examples are taken directly from our test implementations. The
general flow is to prepare a dataset, create a trainer which contains an empty
neural network, and then train the network until a desired prediction accuracy
is achived. All of these examples output the final predictions to the console.
For any larger dataset you will need to compute the prediction accuracy. One way
to do this is to compute the percentage of correct predictions and the average
"confidence" of the predictions.

<details>
<summary>Computing prediction score and confidences</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L92-L104
</details>

<details>
<summary>Rust</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Rust/src/main.rs#L32-L73
</details>

<details>
<summary>F#</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/FSharp/Program.fs#L38-L66
</details>

<details>
<summary>C#</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/CSharp/Program.cs#L28-L58
</details>

<details>
<summary>C++</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Cpp/main.cpp#L49-L101
</details>

<details>
<summary>C</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/C/main.c#L46-L87
</details>

<details>
<summary>Kotlin</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Kotlin/src/Main.kt#L21-L60
</details>

<details>
<summary>Go</summary>
https://github.com/dlidstrom/NeuralNetworkInAllLangs/blob/4c9c8176a9936320af3e777a2159f931a7dca8c9/Go/main.go#L67-L110
</details>

## 3. Training

For training and verifying our implementations we will use two datasets.

### 2.1. Logical Functions
### 3.1. Logical Functions

The first is simple and will be these logical functions: xor, xnor, or, nor,
and, and nand. This truth table represents the values that the network will
Expand All @@ -83,7 +134,7 @@ neurons. Such a network consists of a total of 24 weights:

> 💯 We expect each implementation to learn exactly the same network weights!

#### 2.1.1. Lithmus Test
#### 3.1.1. Lithmus Test

The logical functions example can be used as a "lithmus test" of neural network
implementations. A proper implementation will be able to learn the 6 functions
Expand All @@ -93,7 +144,7 @@ nodes to learn successfully (if at all). A larger network means more
mathematical operations so keep this in mind when you evaluate other
implementations. You don't want to waste cpu cycles unnecessarily.

### 2.2. Hand Written Digits
### 3.2. Hand Written Digits

The second dataset consists of thousands of hand written digits. This is
actually also a "toy" dataset but training a network to recognize all digits
Expand All @@ -116,15 +167,15 @@ the handwritten digit:

Parsing this dataset needs to be implemented for each language.

## 3. Learning
## 4. Learning

Our code will perform backpropagation to learn the weights. We update
the weights after each input. This is called stochastic learning, as
opposed to batch learning where multiple inputs are presented before
updating weights. Stochastic learning is generally preferred [2]. Note
that inputs need to be shuffled for effective learning.

## 4. Implementation Goals
## 5. Implementation Goals

One of our goals is to have as few or no dependencies. These implementations
should be easy to integrate and that requires dependency-free code. Another goal
Expand All @@ -146,7 +197,7 @@ We strive for:
- simple tests that verify our implementations and secure them for the future
- having fun exploring neural networks!

### 4.1. Simple Random Number Generator
### 5.1. Simple Random Number Generator

Now, a note about random number generation. Training a neural network requires
that the initial weights are randomly assigned. We will specify a simple random
Expand Down Expand Up @@ -192,7 +243,7 @@ The first few random numbers are:
> The code samples all contain an extension point where you can plug in your own
> implementation, should you wish to do so (or just hardcode your choice!).

### 4.2. License
### 5.2. License

All code *in this repository* is licensed under MIT license.
This is a **permissive** license and you can use this code in your
Expand All @@ -206,7 +257,7 @@ then you must also license your implementation with MIT license.
> All code in this repo must be licensed under the permissive MIT license.
> Please add license header to every source file. No GPL allowed!

### 4.3. Implementations
### 5.3. Implementations

This is the current status of the implementations available. We follow a maturity model based on these criteria:

Expand All @@ -217,7 +268,7 @@ This is the current status of the implementations available. We follow a maturit
- Level 4: implement a unit test to verify level 3 and make the code future safe

| Language | Level 0 | Level 1 | Level 2 | Level 3 | Level 4 | Contributor |
|-|-|-|-|-|-|-|
|---|:---:|:---:|:---:|:---:|:---:|---|
| C# | ⭐️ | ⭐️ | ⭐️ | ⭐️ | ⭐️ | [@dlidstrom](https://github.com/dlidstrom) |
| Rust | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) |
| F# | ⭐️ | ⭐️ | ⭐️ | | | [@dlidstrom](https://github.com/dlidstrom) |
Expand All @@ -230,7 +281,7 @@ This is the current status of the implementations available. We follow a maturit
> Note! The Python implementation is only here as a reference. If you are using Python you already
> have access to all ai tools and libraries you need.

#### 4.3.1. Sample Output
#### 5.3.1. Sample Output

Digit recognition is done using only 14 hidden neurons, 10 learning epochs (an
epoch is a run through the entire dataset), and a learning rate of 0.5. Using
Expand Down Expand Up @@ -286,7 +337,7 @@ Prediction (output from network for the above input):

Looks good, doesn't it?

## 5. Reference Implementation
## 6. Reference Implementation

For reference we have [a Python implementation](./Python/Xor.py) which uses NumPy,
and should be fairly easy to understand. Why Python? Because Python
Expand All @@ -302,7 +353,7 @@ values to verify your own calculations. The example is the logical functions
shown earlier with the inputs being both `1`, i.e. `1 1`. We will use 3 hidden
neurons and 6 outputs (xor, xnor, and, nand, or, nor).

### 5.1. Inputs and Randomized Starting Weights
### 6.1. Inputs and Randomized Starting Weights

These are the initial values for the input layer and the hidden layer. $w$ is
the weights, $b$ is the biases. Note that we are showing randomized biases here
Expand Down Expand Up @@ -335,7 +386,7 @@ b_{output} & = &
\end{bmatrix} \\
\end{array}$$

### 5.2. Forward Propagation
### 6.2. Forward Propagation

First we show forward propagation for the hidden layer.

Expand Down Expand Up @@ -396,7 +447,7 @@ y_{output} & = & \begin{bmatrix}
\end{bmatrix} \\
\end{array}$$

### 5.3. Backpropagation
### 6.3. Backpropagation

Now we have calculated output. These are off according to the expected output
and the purpose of the next step, backpropagation, is to correct the weights for
Expand Down Expand Up @@ -466,7 +517,7 @@ $$\begin{array}{rcl}
\end{bmatrix}
\end{array}$$

### 5.4. Weight Updates
### 6.4. Weight Updates

Finally we can apply weight updates. $\alpha$ is the learning rate which here
will be $1$. First update weights and biases for the output layer.
Expand Down Expand Up @@ -501,7 +552,7 @@ b_{hidden} & = &
\end{bmatrix}
\end{array}$$

## 6. Using this in your own solution
## 7. Using this in your own solution

If you do use any of these implementations in your own solution, then here
are some things to keep in mind for good results:
Expand All @@ -513,7 +564,7 @@ are some things to keep in mind for good results:
- you may try "annealing" the learning rate, meaning start high (0.5) and slowly
decrease over the epochs

## 7. References
## 8. References

[1] <http://neuralnetworksanddeeplearning.com/> <br>
[2] <https://leon.bottou.org/publications/pdf/tricks-1998.pdf> <br>
Expand Down