21 Questions with the Weather: A Decision Tree Approach to Predicting Day-Ahead Solar Energy

Scaling up solar energy production quickly, over the next 20-50 years, is key to ending our current reliance on climate-damaging fossil fuels. However, solar energy is variable due to clouds and weather.

Electric utility companies need accurate forecasts of solar energy availability in order to plan the correct mix of fossil fuels and renewable energy to be used on any given day.

Errors in forecasting solar energy availability could lead to large expenses in extra fossil fuel consumption or emergency purchases of electricity from neighboring utilities.

Machine learning has the potential to find complex, statistically significant patterns that link numerical weather forecasts to solar energy generation. These trained models can then be used to make accurate predictions for solar power generation at a given solar energy generation site.

The Data

For this project, I used 18 years of every-3-hours numerical weather forecasts (500,000+ examples, 7920 raw features) from 1994-2012 from NOAA/ESRL Global Ensemble Forecast System, obtained from the Kaggle AMS 2013-2014 Solar Energy Prediction Contest.

These forecasts are from 11 different global weather models for 12, 15, 18, 21 and 24 hours ahead at 144 different latitude/longitude locations across Oklahoma.

I used these forecasts to predict the total integrated solar energy from Sun rise to Sun set as measured by 98 Oklahoma Mesonet sites spaced across Oklahoma. The actual total solar energy was measured directly by pyranometers at each site with a 5 minute cadence from 1994-2012.

The Solution

I trained several hundred gradient-boosted decision tree models XGBoost using spatially averaged weather forecasts over 100x100 km to predict the actual integrated total solar energy.

I ran XGBoost to find numerical patterns that link weather forecasts to the actual integrated solar energy measured at each Oklahoma Mesonet site. I was able to predict the actual solar energy available to <5.8% accuracy 24 hours ahead for the vast majority of days, using a portion of the data not used in model training.

Limitations

Numerical weather forecasts with grid spacings of 10+ kilometers lack the resolution necessary to predict the locations of clouds directly. Clouds are the main source of uncertainty in solar energy generation. Thus, more precise satellite data would be necessary to increase predictive power.

The Future

An XGBoost decision-tree model trained to predict solar energy using numerical weather predictions can be easily implemented in real time for a given solar energy general site. The model could be periodically updated with a longer time baseline, and site-specific weather information.

Solar energy prediction models can also be useful for rooftop solar in conjunction with batteries. A virtual powerplant controlling a network of charged batteries linked to multiple rooftop solar sites would likely find solar energy prediction useful to maximize revenue in sale of electricity to the grid.

Installation

Download the weather forecast files gefs_test.tar.gz (or gefs_test.zip) and gefs_train.tar.gz (or gefs_train.zip) from the Kaggle competition data webpage.
Move the files to the Data/ directory.

If you downloaded the .tar.gz files:

mv gefs_train.tar.gz Data/
mv gefs_test.tar.gz Data/

or if you downloaded the .zip files:

mv gefs_train.zip Data/
mv gefs_test.zip Data/

Move into the Data directory: cd Data/
Open the .tar.gz or .zip files to make the Data/train/ and Data/test/ directories.

If you downloaded the .tar.gz files, on OSX you can run:

tar -xzvf gefs_train.tar.gz
tar -xzvf gefs_test.tar.gz

Or if you downloaded the .zip files:

unzip gefs_train.zip
unzip gefs_test.zip

There should now be Data/train/ and Data/test/ directories with multiple *.nc files such as test/apcp_sfc_latlon_subset_20080101_20121130.nc, test/dlwrf_sfc_latlon_subset_20080101_20121130.nc etc. Each file gives weather features described on the Kaggle data webpage.

Install the netCDF4 module.
Switch back to the PredictingSolarEnergy/ directory: cd ..

Usage

First, run train_solar_predict.py to assemble, feature engineer, normalize and train XGBoost models on the raw features for one of the 11 different global forecast weather models. Second, run ensemble.py to run XGBoost a second time combining multiple XGBoost models generated by train_solar_predict.py.

train_solar_predict.py is designed for a lot of experimentation to find the optimal amount of spatial averaging, feature engineering of the weather forecast grid points as well as hyperparameter tuning of XGBoost. From PredictingSolarEnergy/ directory, run the code as:

python Code/train_solar_predict.py --outdir OUTDIR --modelnum MODELNUM --numclosegrid NUM --debug DEBUG --method METH --numrandstate NUMRAND --tag TAG

OUTDIR is the name of the directory for the output files
MODELNUM is the global weather forecast model to use (an integer 0-10)
NUM is the number of grid points over which to spatially average a global weather forecast model. Set to 7for best results.
METH is a string specifying the type of spatial averaging to perform:
1. avg for a straightforward spatial average of the forecast models.
2. use4 for no averaging. XGBoost will determine how best to use the different weather forecasts from different latitudes and longitudes.
3. wavg for using a spatial average weighted by the distance from each weather model grid point to each Mesonet weather station in Oklahoma.
NUMRAND is the integer number of times to run XGBoost at different random states.
TAG is a string to tag the output files with.
DEBUG is for debugging, always set to 0 (1 for debug).

ensemble.py is designed to aggregate the models from train_solar_predict.py and fit an XGBoost model to make the final predictions to be submitted to Kaggle. From the PredictingSolarEnergy/ directory, run the code as:

python Code/ensemble.py --indirtag INDIRTAG --outdir OUTDIR --tag TAG

INDIRTAG is the name of the output directories from train_solar_predict.py with XGBoost models that fit the raw features.
OUTDIR is the name of the directory to place the output files.
TAG is the string to tag the output files with.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Code		Code
Data		Data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

21 Questions with the Weather: A Decision Tree Approach to Predicting Day-Ahead Solar Energy

The Data

The Solution

Limitations

The Future

Installation

Usage

About

Releases

Packages

Languages

garciaev/PredictingSolarEnergy

Folders and files

Latest commit

History

Repository files navigation

21 Questions with the Weather: A Decision Tree Approach to Predicting Day-Ahead Solar Energy

The Data

The Solution

Limitations

The Future

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages