Using Matplotlib and Seaborn to graph the probabilities of the faces of a die in a specified number of rolls.
Law of Large Numbers: In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.
We use a fair dice as an example to demonstrate this law. All the faces of the die are equally likely to occur but initially, when the number of die rolls is small, the frequencies of the faces is uneven and the probability may be more for one face than others. When we roll the die for a large number of times, we see that the result indeed approaches the expected value 1/6 (= 0.166666) or 16.667%.
We create a script that rolls a die (generates a random number between 1 to 6), a specific number of times, e.g. 1000, 5000 then the frequency of each face should be 1000/6 or 5000/6 respectively.
Let's take 6,000,000 for simplicity. Then the frequency for each face should be 1,000,000.
# !/bin/python3
# roll_die.py
"""Roll a six-sided die 6,000,000 times."""
import random
# number of die rolls
num_roll = 100
# face frequency counters
frequency = [0 for i in range(6)]
# 6,000,000 die rolls
for roll in range(num_roll): # note underscore separators
# random value in range 1 to 6 (face of the die)
face = random.randrange(1, 7)
# increment appropriate face counter
frequency[face - 1] += 1
# output: displaying frequency for each face
print(f'Face{"Frequency":>13}')
for i, j in enumerate(frequency):
print(f'{i + 1:>4}{j:>13}')
Running the above script would display an output like:
$ python3 roll_die.py
Face Frequency
1 15
2 10
3 16
4 22
5 13
6 24
Here, the probability is maximum for the face '6' (24%) and minimum for the face '2' (10%). Which is not true because all the face of the die are equally likely to be the outcome.
If the same script were to run by changing the value of num_roll
from 100 to 6,000,000; the output would be:
$ python3 roll_die.py
Face Frequency
1 1000104
2 999846
3 1000645
4 999434
5 1001315
6 998656
We can see that the frequency for each face is almost equal to 1,000,000 and therefore the probability for each face is almost 16.67%. Depending upon your system, this script would take time to execute since we are generating random numbers, 6 million times. Now, we'll create a script to visualize the distribution using matplotlib and seaborn.
- Install venv for Python3 by running
sudo apt-get install python3-venv
in your Linux terminal. - Clone this repository locally, using
git clone https://github.com/adviksinghania/diceroll-visualization.git
- Navigate inside the directory using
cd diceroll-visualization
- Run
python3 -m venv ./env
to create a virtual environment in the current directory. - Run
pip install -r requirements.txt
to install the dependencies. or justpip install matplotlib seaborn
to install matplotlib and seaborn.
Now, run the roll_die_plot.py
in your Linux terminal to create a static graph of the frequency distribution.
Example 1:
$ python roll_die_plot.py 100
Face | Frequency | Probability
1 | 14 | 14.000%
2 | 19 | 19.000%
3 | 14 | 14.000%
4 | 25 | 25.000%
5 | 16 | 16.000%
6 | 12 | 12.000%
Example 2:
$ python roll_die_plot.py 6000000
Face | Frequency | Probability
1 | 999521 | 16.659%
2 | 999198 | 16.653%
3 | 998664 | 16.644%
4 | 1002321 | 16.705%
5 | 999494 | 16.658%
6 | 1000802 | 16.680%
In this script, we use matplotlib animation's FuncAnimation
function which updates the bar plot dynamically.
We'll have to give two command line arguments to this script:
- number of frames — The number of animation frames to display. This value determines the total number of times that FuncAnimation updates the graph. For each animation frame, FuncAnimation calls a function that you define (in this example, update) to specify how to change the plot.
- rolls per frame—The number of times to roll the die in each animation frame. We’ll use a loop to roll the die this number of times, summarize the results, then update the graph with bars and text representing the new frequencies
Example:
python roll_die_dynamic.py 300 20
This will run the script for 300 frames, doing 20 die rolls per frame, for 30 milliseconds per frame (33 FPS) and the update
function will be called 6000 times (20 rolls per frame * 300 frames).