-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aerosol optical properties/MACv2-SP data upload #124
Comments
Comments on the first received file: There are a number of global attributes that should be added to the file (format: - <attribute_name> = "<suggested_value>"). These are important for tracking the file and providing users with required metadata (as well as ensuring that data providers get appropriate credit!).
In terms of the data itself:
|
@znichollscr just catching up, this data is now live at |
Copying out next thoughts from email thread: I’ve had another look at this thread now and rethought things a bit. Simpler questions The first is the source ID. Would it make sense to have Geomar as the institution for the historical data and UHD as the institution for the scenario data? Or do we want both the historical data and scenario data to reference both (which is the current proposal I think). The second is the contact email. Which email address should we use? The third is the variable naming. Should “sig_lat_W” be “sig_lat_N” and “sig_lat_E” be “sig_lat_S”? I assume that the latitudinal extent is north-south rather than east-west? The fourth is the units of theta. These are in radians, which is different to most of the other angle variables which are in degrees. I assume this is on purpose, but just wanted to check. The more complex bit Those easy questions out of the way, I believe this dataset is unique in our collection for two reasons.
The consequence of the second point is that we can’t really alter the format in any way without causing a bunch of headaches. I think creating those headaches wouldn’t make much sense. Having said that, the current format is not at all in line with any of the other datasets/standards we’re trying to follow which makes publishing it on the ESGF, as-is, not ideal. So, the question is what to do about this. One suggestion is below, please point out where I haven’t thought this through properly: Stephanie puts the original dataset (as she sent it to us) somewhere where people can get it. That somewhere is not the ESGF (because the current format doesn’t fit and I don’t think it makes sense to break our database for this dataset, given that the dataset is easy enough to copy). If working out where to put it is a headache, I’m super happy to help set up a Zenodo entry which means the dataset would have a DOI, is automatically versioned and is super easy to download (honestly, if given the blessing, I can just put it up there myself). Re-writing code If it’s of any interest, the code to re-write the original dataset in the input4MIPs format is below. If you want to run it, you’ll need input4mips-validation 0.14.0 installed. Python code for re-write"""
Add input4MIPs information to the raw MAC-SPv2 file
"""
from __future__ import annotations
from functools import partial
from pathlib import Path
import cftime
import iris
import ncdata
import numpy as np
import xarray as xr
from loguru import logger
from input4mips_validation.cvs import load_cvs
from input4mips_validation.cvs.cvs import Input4MIPsCVs
from input4mips_validation.dataset import (
Input4MIPsDataset,
Input4MIPsDatasetMetadata,
)
from input4mips_validation.dataset.dataset import (
add_bounds,
)
from input4mips_validation.validation.datasets_to_write_to_disk import (
get_ds_to_write_to_disk_validation_result,
)
from input4mips_validation.validation.file import get_validate_file_result
from input4mips_validation.xarray_helpers import add_time_bounds
def main( # noqa: PLR0913
raw_file: Path,
output_dir: Path,
cvs: Input4MIPsCVs,
input4mips_metadata: Input4MIPsDatasetMetadata,
non_input4mips_metadata: dict[str, str],
years_to_keep: list[int],
) -> None:
"""
Re-write the file
"""
ds_raw = xr.load_dataset(raw_file)
# Update non-input4MIPs metadata
non_input4mips_metadata = {
"references": "https://doi.org/10.5194/gmd-10-433-2017",
}
for key in ["History", "Author", "Reference"]:
non_input4mips_metadata[key.lower()] = ds_raw.attrs[key]
non_input4mips_metadata["history"] = (
f"{non_input4mips_metadata['history']}. "
"Written into ESGF format with `add-input4mips-info-to-mac.py`."
)
# Only write out years which are relevant
ds_out = ds_raw.sel(years=years_to_keep).copy()
# Fix up mislabelling of features so that they are in line with
# the indexing in the paper: Stevens et al., GMD 2017 (doi:10.5194/gmd-10-433-2017)
ds_out["plume_feature"] = range(1, 1 + ds_out["plume_feature"].size)
# Rename year fraction
ds_out = ds_out.rename({"year_fr": "year_fraction"})
# Convert years to a standard time axis
ds_out = ds_out.rename({"years": "time"})
ds_out["time"] = [cftime.DatetimeNoLeap(y, 7, 2) for y in ds_out["time"]]
# Create a region variable for holding the region names
regions = np.array(
[
ds_raw.attrs[f"plume{i}_region"]
.split("(")[0]
.lower()
.strip()
.replace(" ", "_")
.encode()
for i in range(1, 10)
],
dtype=bytes,
)
ds_out["plume_region"] = xr.DataArray(
data=regions,
coords=(ds_out["plume_number"],),
dims=("plume_number",),
)
# Set the time encoding
ds_out["time"].encoding = {
"calendar": "noleap",
"units": "days since 1850-01-01 00:00:00",
# Time has to be encoded as float
# to ensure that half-days etc. are handled.
"dtype": np.dtypes.Float32DType,
}
ds_out.attrs = {}
# Fix up the attributes of the co-ordinates
coord_attributes = {
"plume_number": {"long_name": "plume number", "units": "1"},
"plume_feature": {"long_name": "plume feature", "units": "1"},
"time": {"standard_name": "time"},
"year_fraction": {
"long_name": "year fraction",
"units": "years",
"comment": "Required for calculating the seasonal cycle",
},
}
for coord in ds_out.coords:
ds_out.coords[coord].attrs = coord_attributes[coord]
# I assume this is a mislabelling
# [TODO: confirm and check which way around the relabelling should go]
ds_out = ds_out.rename({"sig_lat_W": "sig_lat_N", "sig_lat_E": "sig_lat_S"})
# Fix up the attributes of the variables
variable_attributes = {
"is_biomass": {
"long_name": "Is the plume biomass dominated or not (1: yes, 0: no)",
"units": "1",
},
"plume_lat": {
"long_name": "Latitude of the plume's centre",
"units": "degrees_north",
},
"sig_lat_N": {
"long_name": (
"Latitudinal extension of the feature north of its centre (unrotated)"
),
"units": "degrees",
},
"sig_lat_S": {
"long_name": (
"Latitudinal extension of the feature south of its centre (unrotated)"
),
"units": "degrees",
},
"plume_lon": {
"long_name": "longitude of the plume's centre",
"units": "degrees_east",
},
"sig_lon_E": {
"long_name": (
"Longitudinal extension of the feature east of its centre (unrotated)"
),
"units": "degrees",
},
"sig_lon_W": {
"long_name": (
"Longitudinal extension of the feature west of its centre (unrotated)"
),
"units": "degrees",
},
"theta": {
"long_name": "Clockwise rotation of the feature's central axis",
# [TODO Odd that this is in radians but everything else is in degrees?]
"units": "radians",
},
"ftr_weight": {
"long_name": "Feature weight",
"units": "1",
},
"beta_a": {
"long_name": (
"Value of parameter `a` of the beta function "
"which defines the vertical distribution"
),
"units": "1",
},
"beta_b": {
"long_name": (
"Value of parameter `b` of the beta function "
"which defines the vertical distribution"
),
"units": "1",
},
"aod_fmbg": {
"long_name": (
"Background fine mode aerosol optical depth (AOD) at 550nm at source"
),
"units": "1",
},
"aod_spmx": {
"long_name": "Aerosol optical depth (AOD) at the source",
"units": "1",
},
"ssa550": {
"long_name": "Single scattering albedo at 550nm",
"units": "1",
},
"asy550": {
"long_name": "Asymmetry Factor at 550nm",
"units": "1",
},
"angstrom": {
"long_name": "Angstrom exponent",
"units": "1",
},
"ann_cycle": {
"long_name": "Annual cycle for each plume's features",
"units": "1",
},
"year_weight": {
"long_name": "Annual Scaling factor of plume amplitude",
"units": "1",
},
"plume_region": {
"long_name": "Plume's region",
},
}
for var in ds_out.data_vars:
ds_out[var].attrs = variable_attributes[var]
ds_out = add_bounds(
ds_out,
add_time_bounds=partial(
add_time_bounds, yearly_time_bounds=True, monthly_time_bounds=False
),
)
# The line below adds the wrong axis information for year_fraction so don't use
# ds_out = ds_out.cf.guess_coord_axis()
ds_out = ds_out.cf.add_canonical_attributes()
# Ensure time comes first
ds_out = ds_out.transpose("time", ...)
ds_input4mips = Input4MIPsDataset(
data=ds_out,
cvs=cvs,
metadata=input4mips_metadata,
non_input4mips_metadata=non_input4mips_metadata,
)
out_path, ds_disk_ready = ds_input4mips.get_out_path_and_disk_ready_dataset(
root_data_dir=output_dir,
)
# Validate
validation_result = get_ds_to_write_to_disk_validation_result(
ds=ds_disk_ready, out_path=out_path, cvs=cvs
)
validation_result.raise_if_errors()
# Convert to cubes with ncdata
cubes = ncdata.iris_xarray.cubes_from_xarray(ds_disk_ready)
# Fix up some bugs from iris' conversion
# x-ref: https://github.com/SciTools/iris/issues/6216
for cube in cubes:
va = variable_attributes[cube.var_name]
try:
units_specified = va["units"]
except KeyError:
# no unit info
continue
if str(cube.units) != units_specified:
print(
f"Iris mangled the units of {cube.var_name=}. "
f"{units_specified=}. {cube.units=}."
"Fixing now."
)
cube.units = units_specified
# Having validated and converted to cubes, make the target directory.
out_path.parent.mkdir(parents=True, exist_ok=True)
# Write the file to disk
iris.save(
cubes,
out_path,
unlimited_dimensions=("time",),
)
print(f"{out_path=}")
get_validate_file_result(
out_path,
cvs=cvs,
).raise_if_errors()
ds_written = xr.open_dataset(out_path)
assert ds_written["plume_lat"].attrs["units"] == "degrees_north"
assert ds_written["plume_lon"].attrs["units"] == "degrees_east"
if __name__ == "__main__":
logger.enable("input4mips_validation")
RAW_FILE = Path("MACv2.0-SP_v2.nc")
OUTPUT_DIR = Path(".") / "macv2-rewrite"
ORIGINAL_DATA_LINK = "url.to.somewhere"
cvs = load_cvs(Path(__file__).parents[1] / ".." / "input4MIPs_CVs" / "CVs")
input4mips_metadata = Input4MIPsDatasetMetadata(
activity_id="input4MIPs",
contact="sfiedler@geomar.de",
dataset_category="aerosolProperties",
frequency="yr",
further_info_url="http://www.tbd.invalid",
grid_label="gn",
license=(
"The input4MIPs data linked to this entry is licensed "
"under a Creative Commons Attribution 4.0 International "
"(https://creativecommons.org/licenses/by/4.0/). "
"Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse "
"for terms of use governing CMIP6Plus output, "
"including citation requirements and proper acknowledgment. "
"The data producers and data providers make no warranty, either express "
"or implied, including, but not limited to, warranties of merchantability "
"and fitness for a particular purpose. "
"All liabilities arising from the supply of the information "
"(including any liability arising in negligence) "
"are excluded to the fullest extent permitted by law."
),
mip_era="CMIP6Plus",
nominal_resolution="250 km",
realm="atmos",
# TODO: ask Stephanie whether switching or consortium or what
institution_id="G-UHD",
institution=None,
source_id="G-UHD-SPv2-1-0",
source_version="1.0",
# source=None,
target_mip="CMIP",
variable_id="multiple-macv2sp",
comment=(
"This data has been re-written in the input4MIPs format. "
"The data as required by the MACv2-SPv2 Fortran module "
f"can be retrieved from {ORIGINAL_DATA_LINK}"
),
doi=None,
license_id="CC BY 4.0",
# product=None,
# region=None,
)
non_input4mips_metadata = {
"references": "https://doi.org/10.5194/gmd-10-433-2017",
}
main(
raw_file=RAW_FILE,
output_dir=OUTPUT_DIR,
cvs=cvs,
input4mips_metadata=input4mips_metadata,
non_input4mips_metadata=non_input4mips_metadata,
# Only writing data out to the end of history right now
years_to_keep=range(1850, 2020 + 1),
) ncdump of original file$ ncdump -h MACv2.0-SP_v2.nc
netcdf MACv2.0-SP_v2 {
dimensions:
plume_number = 9 ;
plume_feature = 2 ;
years = 251 ;
year_fr = 52 ;
variables:
int plume_number(plume_number) ;
plume_number:long_name = "plume number" ;
int plume_feature(plume_feature) ;
plume_feature:long_name = "plume feature" ;
int years(years) ;
years:unit = "Gregorian Year" ;
years:long_name = "years of scale factor data" ;
float year_fr(year_fr) ;
year_fr:unit = "Year" ;
year_fr:long_name = "fractional year (for seasonal cycle)" ;
int is_biomass(plume_number) ;
is_biomass:units = "Binary (1-yes; 0-no)" ;
is_biomass:long_name = "Biomass Dominated" ;
float plume_lat(plume_number) ;
plume_lat:units = "degrees North" ;
plume_lat:long_name = "latitude of plume center" ;
plume_lat:_FillValue = 9.96921e+36f ;
float plume_lon(plume_number) ;
plume_lon:units = "degrees East" ;
plume_lon:long_name = "longitude of plume center" ;
plume_lon:_FillValue = 9.96921e+36f ;
float sig_lat_W(plume_number, plume_feature) ;
sig_lat_W:units = "degrees North" ;
sig_lat_W:long_name = "Latitude extension of feature West of centre (unrotated)" ;
sig_lat_W:_FillValue = 9.96921e+36f ;
float sig_lat_E(plume_number, plume_feature) ;
sig_lat_E:units = "degrees South" ;
sig_lat_E:long_name = "Latitude extension of feature East of centre (unrotated)" ;
sig_lat_E:_FillValue = 9.96921e+36f ;
float sig_lon_E(plume_number, plume_feature) ;
sig_lon_E:units = "degrees West" ;
sig_lon_E:long_name = "Longitude extension of feature East of centre (unrotated)" ;
sig_lon_E:_FillValue = 9.96921e+36f ;
float sig_lon_W(plume_number, plume_feature) ;
sig_lon_W:units = "degrees East" ;
sig_lon_W:long_name = "Longitude extension of feature West of centre (unrotated)" ;
sig_lon_W:_FillValue = 9.96921e+36f ;
float theta(plume_number, plume_feature) ;
theta:units = "radians" ;
theta:long_name = "Clockwise rotation of feature central axis" ;
theta:_FillValue = 9.96921e+36f ;
float ftr_weight(plume_number, plume_feature) ;
ftr_weight:units = " " ;
ftr_weight:long_name = "Feature Weight" ;
ftr_weight:_FillValue = 9.96921e+36f ;
float beta_a(plume_number) ;
beta_a:units = " " ;
beta_a:long_name = "Parameter a of beta function for vertical distribution" ;
beta_a:_FillValue = 9.96921e+36f ;
float beta_b(plume_number) ;
beta_b:units = " " ;
beta_b:long_name = "Parameter b of beta function for vertical distribution" ;
beta_b:_FillValue = 9.96921e+36f ;
float aod_spmx(plume_number) ;
aod_spmx:units = " " ;
aod_spmx:long_name = "Optical depth (AOD) at source" ;
aod_spmx:_FillValue = 9.96921e+36f ;
float aod_fmbg(plume_number) ;
aod_fmbg:units = " " ;
aod_fmbg:long_name = "Background fine mode optical depth (AOD) at 550 nm at source" ;
aod_fmbg:_FillValue = 9.96921e+36f ;
float ssa550(plume_number) ;
ssa550:units = " " ;
ssa550:long_name = "Single Scattering Albedo at 550 nm" ;
ssa550:_FillValue = 9.96921e+36f ;
float asy550(plume_number) ;
asy550:units = " " ;
asy550:long_name = "Asymmetry Factor at 550 nm" ;
asy550:_FillValue = 9.96921e+36f ;
float angstrom(plume_number) ;
angstrom:units = " " ;
angstrom:long_name = "Angstrom Exponent" ;
angstrom:_FillValue = 9.96921e+36f ;
float ann_cycle(plume_number, year_fr, plume_feature) ;
ann_cycle:units = " " ;
ann_cycle:long_name = "Annual cycle for each of a plume\'s features" ;
ann_cycle:_FillValue = 9.96921e+36f ;
float year_weight(plume_number, years) ;
year_weight:unit = "Amplitude relative to 2005, scaled by CEDS emissions" ;
year_weight:long_name = "Annual Scaling factor for plume amplitude" ;
year_weight:_FillValue = 9.96921e+36f ;
// global attributes:
:History = "File created by sp_make-data_v1.ncl on Fri Jul 12 13:48:56 CEST 2024" ;
:Author = "Bjorn Stevens" ;
:Reference = "Stevens et al., 2016: Simple Plumes ... GMD" ;
:plume1_region = "Europe" ;
:plume2_region = "North America" ;
:plume3_region = "East Asia" ;
:plume4_region = "South Asia" ;
:plume5_region = "North Africa (biomass)" ;
:plume6_region = "South America (biomass)" ;
:plume7_region = "Maritime Continent (biomass)" ;
:plume8_region = "South Central Africa (biomass)" ;
:plume9_region = "Australia" ;
} ncdump of re-written file$ ncdump -h macv2-rewrite/input4MIPs/CMIP6Plus/CMIP/G-UHD/G-UHD-SPv2-1-0/atmos/yr/multiple-macv2sp/gn/v20241105/multiple-macv2sp_input4MIPs_aerosolProperties_CMIP_G-UHD-SPv2-1-0_gn_1850-2020.nc
netcdf multiple-macv2sp_input4MIPs_aerosolProperties_CMIP_G-UHD-SPv2-1-0_gn_1850-2020 {
dimensions:
plume_number = 9 ;
bnds = 2 ;
plume_feature = 2 ;
dim1 = 20 ;
time = UNLIMITED ; // (171 currently)
year_fraction = 52 ;
variables:
float ssa550(plume_number) ;
ssa550:long_name = "Single scattering albedo at 550nm" ;
ssa550:units = "1" ;
int plume_number(plume_number) ;
plume_number:bounds = "plume_number_bnds" ;
plume_number:units = "1" ;
plume_number:long_name = "plume number" ;
double plume_number_bnds(plume_number, bnds) ;
float sig_lat_N(plume_number, plume_feature) ;
sig_lat_N:long_name = "Latitudinal extension of the feature north of its centre (unrotated)" ;
sig_lat_N:units = "degrees" ;
int64 plume_feature(plume_feature) ;
plume_feature:bounds = "plume_feature_bnds" ;
plume_feature:units = "1" ;
plume_feature:long_name = "plume feature" ;
double plume_feature_bnds(plume_feature, bnds) ;
float angstrom(plume_number) ;
angstrom:long_name = "Angstrom exponent" ;
angstrom:units = "1" ;
float ftr_weight(plume_number, plume_feature) ;
ftr_weight:long_name = "Feature weight" ;
ftr_weight:units = "1" ;
char plume_region(plume_number, dim1) ;
plume_region:long_name = "Plume\'s region" ;
float asy550(plume_number) ;
asy550:long_name = "Asymmetry Factor at 550nm" ;
asy550:units = "1" ;
float year_weight(time, plume_number) ;
year_weight:long_name = "Annual Scaling factor of plume amplitude" ;
year_weight:units = "1" ;
double time(time) ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "days since 1850-01-01" ;
time:standard_name = "time" ;
time:calendar = "365_day" ;
time:amip = "time" ;
double time_bnds(time, bnds) ;
float sig_lon_E(plume_number, plume_feature) ;
sig_lon_E:long_name = "Longitudinal extension of the feature east of its centre (unrotated)" ;
sig_lon_E:units = "degrees" ;
float sig_lat_S(plume_number, plume_feature) ;
sig_lat_S:long_name = "Latitudinal extension of the feature south of its centre (unrotated)" ;
sig_lat_S:units = "degrees" ;
int is_biomass(plume_number) ;
is_biomass:long_name = "Is the plume biomass dominated or not (1: yes, 0: no)" ;
is_biomass:units = "1" ;
float beta_a(plume_number) ;
beta_a:long_name = "Value of parameter `a` of the beta function which defines the vertical distribution" ;
beta_a:units = "1" ;
float aod_spmx(plume_number) ;
aod_spmx:long_name = "Aerosol optical depth (AOD) at the source" ;
aod_spmx:units = "1" ;
float plume_lat(plume_number) ;
plume_lat:long_name = "Latitude of the plume\'s centre" ;
plume_lat:units = "degrees_north" ;
float beta_b(plume_number) ;
beta_b:long_name = "Value of parameter `b` of the beta function which defines the vertical distribution" ;
beta_b:units = "1" ;
float aod_fmbg(plume_number) ;
aod_fmbg:long_name = "Background fine mode aerosol optical depth (AOD) at 550nm at source" ;
aod_fmbg:units = "1" ;
float ann_cycle(plume_number, year_fraction, plume_feature) ;
ann_cycle:long_name = "Annual cycle for each plume\'s features" ;
ann_cycle:units = "1" ;
float year_fraction(year_fraction) ;
year_fraction:bounds = "year_fraction_bnds" ;
year_fraction:units = "years" ;
year_fraction:long_name = "year fraction" ;
year_fraction:comment = "Required for calculating the seasonal cycle" ;
float year_fraction_bnds(year_fraction, bnds) ;
float sig_lon_W(plume_number, plume_feature) ;
sig_lon_W:long_name = "Longitudinal extension of the feature west of its centre (unrotated)" ;
sig_lon_W:units = "degrees" ;
float theta(plume_number, plume_feature) ;
theta:long_name = "Clockwise rotation of the feature\'s central axis" ;
theta:units = "radians" ;
float plume_lon(plume_number) ;
plume_lon:long_name = "longitude of the plume\'s centre" ;
plume_lon:units = "degrees_east" ;
// global attributes:
:Conventions = "CF-1.7" ;
:activity_id = "input4MIPs" ;
:author = "Bjorn Stevens" ;
:comment = "This data has been re-written in the input4MIPs format. The data as required by the MACv2-SPv2 Fortran module can be retrieved from url.to.somewhere" ;
:contact = "sfiedler@geomar.de" ;
:creation_date = "2024-11-05T12:49:54Z" ;
:dataset_category = "aerosolProperties" ;
:frequency = "yr" ;
:further_info_url = "http://www.tbd.invalid" ;
:grid_label = "gn" ;
:history = "File created by sp_make-data_v1.ncl on Fri Jul 12 13:48:56 CEST 2024. Written into ESGF format with `add-input4mips-info-to-mac.py`." ;
:institution_id = "G-UHD" ;
:license = "The input4MIPs data linked to this entry is licensed under a Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6Plus output, including citation requirements and proper acknowledgment. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
:license_id = "CC BY 4.0" ;
:mip_era = "CMIP6Plus" ;
:nominal_resolution = "250 km" ;
:realm = "atmos" ;
:reference = "Stevens et al., 2016: Simple Plumes ... GMD" ;
:references = "https://doi.org/10.5194/gmd-10-433-2017" ;
:source_id = "G-UHD-SPv2-1-0" ;
:source_version = "1.0" ;
:target_mip = "CMIP" ;
:tracking_id = "hdl:21.14100/1ad6718a-e13a-49eb-8863-912ec82a5fd2" ;
:variable_id = "multiple-macv2sp" ;
} |
Issue for tracking the progress and any issues related to the aerosol optical properties/MACv2-SP data upload.
cc @durack1 @vnaik60
The text was updated successfully, but these errors were encountered: