The Copernicus Emergency Management Service (CEMS) Wildfire dataset spans from June 2017 to April 2023. The dataset includes Sentinel-2 images related to wildfires, along with their respective severity and delineation masks. Additionally, the dataset is enhanced with cloud and landcover masks, providing more valuable information for future training of a semantic segmentation model. The dataset comprises over 500+ high-quality images, suitable for subsequent semantic segmentation model training. The dataset is available on Huggingface
The Copernicus Emergency Management Service (CEMS) is a component of the European Union's Copernicus Programme. It provides rapid geospatial information during emergencies, and damage assessment for events such as floods, wildfires, and earthquakes. In particular, Copernicus Rapid Mapping provides on-demand mapping services in cases of various natural disasters, offering detailed and up-to-date geospatial information that assists in disaster management and risk assessment.
The satellite imagery comes from Sentinel-2 L2A, which spans across 12 distinct bands of the light spectrum, each with a resolution of 10 meters. The Sentinel Level-2A data undergoes an atmospheric correction process to adjust for the reflectance values influenced by the moisture in the atmosphere. These images are downloaded using SentinelHub APIs.
The structure of the dataset is the following:
dataset/
├── dataOptimal/
│ ├── EMSRXXX/
│ │ ├── AOIYY/
│ │ │ ├── EMSRXXX_AOIYY_01/
│ │ │ │ ├── EMSRXXX_AOIYY_01_Annual9_LC.png # Landcover data Lulc (9 classes)
│ │ │ │ ├── EMSRXXX_AOIYY_01_Annual9_LC.tif # Landcover data Lulc (9 classes) in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_CM.png # Cloud mask generated from cloudSen12
│ │ │ │ ├── EMSRXXX_AOIYY_01_CM.tif # Cloud mask from cloudSen12 in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_DEL.png # Delineation mask
│ │ │ │ ├── EMSRXXX_AOIYY_01_DEL.tif # Delineation mask in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_ESA_LC.png # Landcover data ESA WorldCover 2020
│ │ │ │ ├── EMSRXXX_AOIYY_01_ESA_LC.tif # Landcover data ESA WorldCover 2020 in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_Esri10_LC.png # Landcover data 2020 Global 10 Class (LULC)
│ │ │ │ ├── EMSRXXX_AOIYY_01_Esri10_LC.tif # Landcover data 2020 Global 10 Class (LULC) in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_GRA.png # Grading mask
│ │ │ │ ├── EMSRXXX_AOIYY_01_GRA.tif # Grading mask in georeferenced format
│ │ │ │ ├── EMSRXXX_AOIYY_01_S2L2A.json # Image additional metadata from SentinelHub
│ │ │ │ ├── EMSRXXX_AOIYY_01_S2L2A.png # Sentinel2 image
│ │ │ │ └── EMSRXXX_AOIYY_01_S2L2A.tiff # Sentinel2 image in georeferenced format
│ │ │ └── EMSRXXX_AOIYY_01_merged.png # Merge between several tiles from sentinelHub
│ │ │
│ │ ├── EMSRXXX_AOIYY_02/
│ │ │ └── ...
│ │
├── dataSuboptimal/
│ └── ...
A sample from the dataset is made available to give you a representative overview of the data structure and accompanying metadata.
All the informations of the dataset are available inside csv_files/ folder:
- dataset_Preconfigured.csv: It contains all the activations from the dataset, including the activation date of the event and the interval date for SentinelHub API
- satelliteData.csv: All information about each image is stored here.
- log.txt: general log for errors and messages.
First, install all requirements in assets/requirements.txt and cloudSen12 library
pip install -r /assets/requirements.txt
pip install cloudsen12
In case error with cartopy library, install
sudo apt -y install libgeos-dev
"""NOTE: All variable and parameter for customization such as path/to/folder are in src.utils_variables"""
from src.downloadDataset import DownloadCEMSDataset as CemsDataset
import warnings
warnings.filterwarnings('ignore')
download_cems = CemsDataset()
"""Uncomment this line if you would download and create the whole dataset"""
# download_cems.download_EMSR_Manager(allActivation = True)
"""Uncomment this line if you would download a specif activation, pay attention to the format: EMSRXXX and AOIXX"""
# download_cems.download_EMSR_Manager(Emsr = "EMSR382", Aoi = "AOI01", grading = True, delineation = True, estimation = True)
"""Uncomment this line if you would create/recreate the cloud cover for the whole dataset"""
# download_cems.downloadCloudCover()
"""Uncomment this line if you would re/download land cover for the whole dataset"""
# download_cems.downloadLandCover()
"""Uncomment this line if you would re/save satellite data with all information about the dataset"""
# download_cems.saveSatelliteDataCSV()
"""Uncomment this line if you have changed the folder arrangement, so it will be possible replicate the dataset arrangement
NOTE: USE THIS FUNCTION ONLY IN CASE OF DEBUG"""
# download_cems.createPreconfiguredCSV()
"""Uncomment this line if you would copy all .png images in a separate folder"""
# download_cems.copyAllImageToRGBFolder()
From Copernicus Rapid Mapping for each activation under the tag Wildfire are available different post-fire products:
- FEP (First Estimation): a first estimation of the burned area.
- DEL (Delineation): a delineation of the area affected by the wildfire.
- GRA (Grading): a detailed description about the severity of the burned area.
Each product includes metadata and associated JSON files which contain geographical details about the affected areas.
NOTE: Since I do not have the license to distribute those files then they must be retrived directly from Copenicus website. I left only the folder structure in copernicusData folder.
On Copernicus site are available several georeferenced data in GeoJSON format. For this dataset are considered only those files for each activation that are formatted with the following string:
EMSRXXX_AOIYY_TYPE_PRODUCT_areaOfInterestA.json
, where TYPE can beGRA
,DEL
orFEP
, defines the AOI YY of that particular activation XXX where the event happened.EMSRXXX_AOIYY_TYPE_PRODUCT_observedEventA.json
, where TYPE can beDEL
orFEP
, defines the multipolygons geometry for the wildfire delineation for an AOI YY of the activation XXX.EMSRXXX_AOIYY_GRA_PRODUCT_naturalLandUseA.json
defines the various multipolygons geometry for the grading damage levels for an AOI YY of the activation XXX.
Those are the different grading levels of damage used in the Copernicus products:
Here in order are reported the DEL map, GRA map and the actual sentinel-2 Image for the activation EMSR382
Creating cloud masks before making inferences on Sentinel-2 images is important because clouds can obscure or distort the underlying land cover or land use information that is the focus of the analysis. This can lead to inaccurate or incomplete results. Sentinel-2 images are often used for remote sensing applications, such as monitoring vegetation health, mapping land cover and land use, and detecting changes over time. However, clouds can interfere with these applications by blocking or reflecting the light that is captured by the satellite, which can result in missing or distorted data. By default, all images are retrieved from sentinel-hub with the condition of no more than 10 percent of cloud coverage. However some images have a relevant cloud coverage.
This dataset makes available for each image a cloud masks: the areas that are affected by clouds can be identified and excluded from future analyses. This ensures that the inferences made from the Sentinel-2 data are based on accurate and reliable information. The masks were generated using the CloudSen12 model.
The output prediction of CloudSen12 has 4 different layers for cloud coverage:
The cloud masks resulted from CloudSen12 model for activation EMSR382
One challenge of using a U-Net for image segmentation is to have smooth predictions, especially if the receptive field of the neural network is a small amount of pixels. In the context of the U-Net architecture for image segmentation, blending image patches can be used to generate smooth predictions by reducing the effect of discontinuities at patch boundaries. This approach involves dividing the input image into overlapping patches, running the U-Net architecture on each patch individually, and then blending the resulting predictions together to form a single output image.
By blending the predictions from multiple patches, the resulting output image is typically smoother and more continuous than if a single U-Net model was trained on the entire input image. This can help to reduce artifacts and improve the overall quality of the segmentation results. In this work the source code of cloudSen12 has been customized so that it could be smoothly predicted. The source code for smooth-blend is available here
Cloud mask of activation EMSR638. It is clear that on left mask there are some problem due border effect in cloudSen12 model. On the right the result using smoothing
In addition to wildfire delineation, severity and cloud masks, also the landcovers is provided for each image. In particular the models considered are:
- ESRI 10m Annual Land Use Land Cover (2017-2021);
- ESRI 2020 Global Land Use Land Cover;
- ESA WorldCover 10 m 2020.
All this lancover are downloaded from Planetary Computer
All landcover models are based on sentinel2 10-meter resolution.
There are 9 classes:
Landcover ESRI 9 classes annual land use activation EMSR382_AOI01
There are 10 classes:
Landcover ESRI 10 classes 2020 land use activation EMSR382_AOI01
These are 10 classes:
More informations are availble in the Esa worldcover manual
Landcover ESA worldcover 2020 land use activation EMSR382_AOI01