This project offers a Gymnasium environment that simulates a wildfire and an agent that can do various actions in order to put-out the fire. See Why?!
- Create vitual python environment
conda create -n wildfires python=3.10
- Activate environment
conda activate wildfires
- Install dependencies
pip install -r requirements.txt
- Play the game!! (first run might take 5 seconds or so to start)
python3 src/main.py +action=play +MDP=MDP_basic
PS: Change the MDP configuration in environment and see what happens 😉
Train a PPO agent from Stable-Baselines3:
python3 src/main.py +action=train +MDP=MDP_basic +train=train_PPO_single_run_basic
OR add a custom configuration under configs/train and call
python3 src/main.py +action=train +MDP=MDP_basic +train=custom_config
OR continue training from a trained agent (same configuration has to be used). Checkout releases for agents.
python3 src/main.py +action=train +MDP=MDP_basic +train=train_PPO_single_run_basic trained_agent_path=path/to/best_model.zip
Don't forget to launch Tensorboard to see your logs ;)
tensorboard --logdir='./logs'
OR when using vscode run command "Python: Launch Tensorboard"
For a description of what the different graphs mean, checkout the Stable Baselines 3 Logger Docs
To evaluate a trained agent
python3 src/main.py +action=eval +MDP=MDP_basic trained_agent_path=path/to/best_model.zip
OR to record a trained agent (videos will be in folder './videos')
python3 src/main.py +action=record +MDP=MDP_basic trained_agent_path=path/to/best_model.zip
Checkout releases for agents.
When running in eval mode by setting the eval_mode
to true in MDP configuration the reported reward is actually the proportion of trees left from the starting number of trees for each episode. This is more useful than the accumulated rewards because it allows us to compare different reward functions and gives a somewhat realistic estimate of our true goal which is stopping the fire as fast as possible and saving the trees.
Accoring to different resources [1] [2] millions of acres of green forests are lost every year due to wildfires and this leads to a vicious cycle of more wildfires due to the carbon emissions from the fire as shown in the follow picture: This also leads to what is called "Extreme wildfires" [3] that are beyond our human capacity and current technology to put them out / limit them. This is where the use of Reinforcement Learning would help, what if the AI can more efficiently put-out wildfires?
Our MDP will be built on the "Forest-fire Model" which is based on 4 simple rules:
- A burning cell turns into an empty cell
- A tree will burn if at least one neighbor is burning
- A tree ignites with probability f even if no neighbor is burning
- An empty space fills with a tree with probability p
Some ideas in our MDP are also similar to this paper Because in our task we're more cocerned about putting out the fires than what happens after the fires (i.e. trees regrowing) rule number 4 will be dropped. This also has the nice benefit of making the MDP episodic; i.e the game ends when all trees burn-out or the fire is put-out. Another point regarding rule number 3, the environment will start with a random set of trees on fire (how many is hyperparameter) and no trees will self-ignite dynamically after the start. The agent in our case will act as the Fire Department / Goverment and has certain resources at its disposal which it can use to fight the fires.
- Grid world with X x Y cells (this can also be a hyperparameter to test how different world shapes affect the agent strategy)
- Cells have four possible states: Empty/Earth, Tree, Fire, Trench (dug up from firefighters)
- Agent has
$a$ number of firefighters - Agent has
$b$ number of firetrucks - Agent has
$c$ number of helicopter / planes - Agent has a budget of size
$d$ (money)
- Send firefighters to a specific location and do action [6]
- Control line: stops fire from spreading in a certain direction a long a virtual wall (e.g. trench) with probability
$P_{a1}$ and costs$C_{a1}$ - Burnout: removes trees along a one dimensional line with a max length to stop fire from spreading; removing the trees will work with 100% probability and will cost
$C_{a2}$
- Control line: stops fire from spreading in a certain direction a long a virtual wall (e.g. trench) with probability
- Send firetruck to a specific location to put-out fire with probability
$P_{a4}$ and costs$C_{a4}$ - Send helicopters / planes to a specific location to put-out fire with probability
$P_{a5}$ and costs$C_{a5}$
- Fire will spread in case of no action taken at the given location as described by the Forest-fire model
- All actions that have a cost will reduce the agents budget
- Action "control line" will replace the cells along a certain line by "trench" cells
- Action "burnout" will replace tree cells a long a line with empty cells
- Action "firetruck" will put out fires that exist in the given location with probability
$P_{a4}$ - Action "helicopter/plane" will put out fires at the given location with proabability
$P_{a5}$ - Termination
- Agent uses all resources it has
- Fire is put out
- Fire cosumes the whole map
- Fire stopped: reward is number of trees still standing + budget remaining
- Agent does action: negative rewards, i.e. associated costs with that action
- Fire consumes entire map: complete and utter failure, -10000 reward
- All the transition probabilites, costs, and actions listed above
- Size and shape of the environment
- Probability of fire spreading; higher values allows us to simulate casees of. "extreme wildfires"
- Simulate wind
- Simulate wildfires with different hotspots
- [1] https://www.wri.org/insights/global-trends-forest-fires
- [2] https://sgp.fas.org/crs/misc/IF10244.pdf
- [3] https://www.sciencedirect.com/science/article/abs/pii/B9780128157213000011
- [4] https://www.wikiwand.com/en/Forest-fire_model
- [5] https://www.wikiwand.com/en/Aerial_firefighting
- [6] https://www.mentalfloss.com/article/57094/10-strategies-fighting-wildfires
- [7] https://doi.org/10.1109/IJCNN48605.2020.9207548