This repository contains a notebook with the analysis of the open dataset: US Wildfires. I'm going to show some preliminary exploratory data analysis and a ML model to answer the following questions:
- Have wildfires become more or less frequent over time?
- What counties are the most and least fire-prone?
- Given the size, location and date, can you predict the cause of a fire wildfire?
The code was tested in Python 3.9 but older version may work as well. I would advise to use conda to create a separate environment to make sure it works.
- Download and extract the repository
- Sign in to Kaggle, download the dataset from the link above, extract the sqlite file and put it in the files folder of the repo<
- Download st99_d00.dbf/.shp/.shx from here and put them in the files folder
- Download the tl_2019_us_county.zip file from here, extract it and put the .dbf/.shp/.shx in the files folder
- (Optional) Create a conda environment with Python 3.9
- Make sure you have the latest version of the requirements or install them anew
You can create a new conda environment by running:
conda create -n env_name
To install the requirements you need to run:
pip install -r requirements.txt
Just run every cell one by one. If some files is missing or corrupted an error message will be raise. For every point there are interactive widgets you can use to customise the output.