Fighting wildfires with Reinforcement Learning

This project offers a Gymnasium environment that simulates a wildfire and an agent that can do various actions in order to put-out the fire. See Why?!

Setup

Create vitual python environment

conda create -n wildfires python=3.10

Activate environment

conda activate wildfires

Install dependencies

pip install -r requirements.txt

Play the game!! (first run might take 5 seconds or so to start)

  python3 src/main.py +action=play +MDP=MDP_basic

PS: Change the MDP configuration in environment and see what happens 😉

Train

Train a PPO agent from Stable-Baselines3:

python3 src/main.py +action=train +MDP=MDP_basic +train=train_PPO_single_run_basic

OR add a custom configuration under configs/train and call

python3 src/main.py +action=train +MDP=MDP_basic +train=custom_config

OR continue training from a trained agent (same configuration has to be used). Checkout releases for agents.

python3 src/main.py +action=train +MDP=MDP_basic +train=train_PPO_single_run_basic trained_agent_path=path/to/best_model.zip

Don't forget to launch Tensorboard to see your logs ;)

tensorboard --logdir='./logs'

OR when using vscode run command "Python: Launch Tensorboard"

For a description of what the different graphs mean, checkout the Stable Baselines 3 Logger Docs

Evaluate

To evaluate a trained agent

python3 src/main.py +action=eval +MDP=MDP_basic trained_agent_path=path/to/best_model.zip

OR to record a trained agent (videos will be in folder './videos')

python3 src/main.py +action=record +MDP=MDP_basic trained_agent_path=path/to/best_model.zip

Checkout releases for agents.

When running in eval mode by setting the eval_mode to true in MDP configuration the reported reward is actually the proportion of trees left from the starting number of trees for each episode. This is more useful than the accumulated rewards because it allows us to compare different reward functions and gives a somewhat realistic estimate of our true goal which is stopping the fire as fast as possible and saving the trees.

Why?

Accoring to different resources [1] [2] millions of acres of green forests are lost every year due to wildfires and this leads to a vicious cycle of more wildfires due to the carbon emissions from the fire as shown in the follow picture: This also leads to what is called "Extreme wildfires" [3] that are beyond our human capacity and current technology to put them out / limit them. This is where the use of Reinforcement Learning would help, what if the AI can more efficiently put-out wildfires?

MDP

Our MDP will be built on the "Forest-fire Model" which is based on 4 simple rules:

A burning cell turns into an empty cell
A tree will burn if at least one neighbor is burning
A tree ignites with probability f even if no neighbor is burning
An empty space fills with a tree with probability p

Some ideas in our MDP are also similar to this paper Because in our task we're more cocerned about putting out the fires than what happens after the fires (i.e. trees regrowing) rule number 4 will be dropped. This also has the nice benefit of making the MDP episodic; i.e the game ends when all trees burn-out or the fire is put-out. Another point regarding rule number 3, the environment will start with a random set of trees on fire (how many is hyperparameter) and no trees will self-ignite dynamically after the start. The agent in our case will act as the Fire Department / Goverment and has certain resources at its disposal which it can use to fight the fires.

State

Grid world with X x Y cells (this can also be a hyperparameter to test how different world shapes affect the agent strategy)
Cells have four possible states: Empty/Earth, Tree, Fire, Trench (dug up from firefighters)
Agent has $a$ number of firefighters
Agent has $b$ number of firetrucks
Agent has $c$ number of helicopter / planes
Agent has a budget of size $d$ (money)

Actions

Send firefighters to a specific location and do action [6]
- Control line: stops fire from spreading in a certain direction a long a virtual wall (e.g. trench) with probability $P_{a1}$ and costs $C_{a1}$
- Burnout: removes trees along a one dimensional line with a max length to stop fire from spreading; removing the trees will work with 100% probability and will cost $C_{a2}$
Send firetruck to a specific location to put-out fire with probability $P_{a4}$ and costs $C_{a4}$
Send helicopters / planes to a specific location to put-out fire with probability $P_{a5}$ and costs $C_{a5}$

Transition probabilities

Fire will spread in case of no action taken at the given location as described by the Forest-fire model
All actions that have a cost will reduce the agents budget
Action "control line" will replace the cells along a certain line by "trench" cells
Action "burnout" will replace tree cells a long a line with empty cells
Action "firetruck" will put out fires that exist in the given location with probability $P_{a4}$
Action "helicopter/plane" will put out fires at the given location with proabability $P_{a5}$
Termination
- Agent uses all resources it has
- Fire is put out
- Fire cosumes the whole map

Rewards

Fire stopped: reward is number of trees still standing + budget remaining
Agent does action: negative rewards, i.e. associated costs with that action
Fire consumes entire map: complete and utter failure, -10000 reward

Hyperparameters

All the transition probabilites, costs, and actions listed above
Size and shape of the environment
Probability of fire spreading; higher values allows us to simulate casees of. "extreme wildfires"

Further improvements

Simulate wind
Simulate wildfires with different hotspots

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fighting wildfires with Reinforcement Learning

Setup

Train

Evaluate

Why?

MDP

State

Actions

Transition probabilities

Rewards

Hyperparameters

Further improvements

References

About

Releases 1

Packages

Languages

License

Mekacher-Anis/Wildfires-RL

Folders and files

Latest commit

History

Repository files navigation

Fighting wildfires with Reinforcement Learning

Setup

Train

Evaluate

Why?

MDP

State

Actions

Transition probabilities

Rewards

Hyperparameters

Further improvements

References

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages