Altering the DRS #64
Replies: 3 comments 20 replies
-
Copying some of #17 (comment) here to get things moving
|
Beta Was this translation helpful? Give feedback.
-
As a note, this may also affect the volcanic data. |
Beta Was this translation helpful? Give feedback.
-
@znichollscr I like where you are going with this. As we're in prototype mode until January 2025, unlike with CMIP6 we have some time to work through this logic and thanks to your validator tool, make changes quickly and easily. And a couple of comments (responding to various other threads too)
|
Beta Was this translation helpful? Give feedback.
-
tl;dr - I wonder if we should consider altering the DRS. #17 provides a concrete use case of where it doesn't really work. I add one more below. This discussion is for considering such an alteration and its implications.
The case of solar data, see #17
Details
The solar datasets are published in (effectively) 3 forms: daily, monthly and pre-industrial control.
Under the current DRS, the filenames would be:
multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn.nc
multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn_185001-202312.nc
multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn_18500101-20231231.nc
I think it's fair to say that only the eagle-eyed can spot the difference between these, and that the filenames don't really give much information.
For reference, the full filepaths add more information, but still aren't exactly obvious
input4MIPs/CMIP6Plus/CMIP/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/fx/multiple/gn/v20240729/multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn.nc
input4MIPs/CMIP6Plus/CMIP/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/mon/multiple/gn/v20240729/multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn_185001-202312.nc
input4MIPs/CMIP6Plus/CMIP/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/day/multiple/gn/v20240729/multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-3_gn_18500101-20231231.nc
The case of scenario data in CMIP6
Details
The key issue here is that the filename contains no information about what experiment is being target by the file. The only way to make a distinction is to supply the data under a different source ID. Using a difference source ID feels like the wrong solution to me (the same group is providing the data i.e. it has the same source hence the source ID should be the same) and I doubt it is an obvious convention for the modelling groups to parse either.
For example, in CMIP6, scenario data was submitted under a different source ID for each scenario. To me, this doesn't make much sense, the same groups were supplying the data i.e. the source was the same, hence the source ID should have been the same.
A potential solution
Details
One way out of this would be to alter the DRS, making it more like the model output DRS in many ways. The current DRS is:
I would suggest to alter the DRS, so it becomes:
So, we basically add two keys: "target_experiment" and "variant_label".
"target_experiment" would identify the experiment being targeted by the dataset, in the same way that "target_mip" identifies the target MIP. This would mostly be "historical", but would allow a single source ID to provide data for multiple experiments (helpful for the scenario data production discussed above). As with target MIP, the data could be used for other MIPs and experiments, but this would identify the experiment that the data producer had in mind when they produced the data.
"variant_label" would identify the variant of the data. All data providers would have to provide a "main" variant, and only "main" variants would be allowed in CMIP stuff (at least to start with while we don't have forcing uncertainty MIPs). However, this extra piece of information would allow us to clearly distinguish between variations coming from the same data source, something which will be increasingly needed as we move towards exploring uncertainty in forcings.
How would this solution work in our two cases?
Solar data
Details
The added fields make it easy to see a) which experiment is being targeted, which makes it way easier to pick out the pi-control file and b) the difference between the sensitivity case and the reference/main case.
Main files become:
input4MIPs/CMIP6Plus/CMIP/piControl/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/fx/multiple/gn/main/v20240729/multiple_input4MIPs_solar_CMIP_piControl_SOLARIS-HEPPA-CMIP-4-3_gn_main.nc
input4MIPs/CMIP6Plus/CMIP/historical/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/mon/multiple/gn/main/v20240729/multiple_input4MIPs_solar_CMIP_historical_SOLARIS-HEPPA-CMIP-4-3_gn_main_185001-202312.nc
input4MIPs/CMIP6Plus/CMIP/historical/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/day/multiple/gn/main/v20240729/multiple_input4MIPs_solar_CMIP_historical_SOLARIS-HEPPA-CMIP-4-3_gn_main_18500101-20231231.nc
Sensitivity files become:
input4MIPs/CMIP6Plus/Prototype/piControl/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/fx/multiple/gn/sensitivity/v20240729/multiple_input4MIPs_solar_Prototype_piControl_SOLARIS-HEPPA-CMIP-4-3_gn_sensitivity.nc
input4MIPs/CMIP6Plus/Prototype/historical/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/mon/multiple/gn/sensitivity/v20240729/multiple_input4MIPs_solar_Prototype_historical_SOLARIS-HEPPA-CMIP-4-3_gn_sensitivity_185001-202312.nc
input4MIPs/CMIP6Plus/Prototype/historical/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-3/atmos/day/multiple/gn/sensitivity/v20240729/multiple_input4MIPs_solar_Prototype_historical_SOLARIS-HEPPA-CMIP-4-3_gn_sensitivity_18500101-20231231.nc
I, at least, find this much easier to understand (I would probably also flip the order of some of the keys around so we don't have files starting with the confusing name 'multiple' and would add frequency information into the filename, perhaps removing some other stuff, but one thing at a time).
Scenario data
Details
The introduction of the target experiment makes it much easier to pick out what is actually varying in these files, while giving us the chance to use the same source ID for files which came from the same source.
input4MIPs/CMIP6Plus/ScenarioMIP/ssp-new-1-15/CR/CR-1-1-3/atmos/day/mole-fraction-of-carbon-dioxide-in-air/gn/main/v20250201/mole-fraction-of-carbon-dioxide-in-air_input4MIPs_GHGConcentrations_ScenarioMIP_ssp-new-1-15_CR-1-1-_gn_main_20240101-22991231.nc
input4MIPs/CMIP6Plus/ScenarioMIP/ssp-new-2-43/CR/CR-1-1-3/atmos/day/mole-fraction-of-carbon-dioxide-in-air/gn/main/v20250201/mole-fraction-of-carbon-dioxide-in-air_input4MIPs_GHGConcentrations_ScenarioMIP_ssp-new-2-43_CR-1-1-_gn_main_20240101-22991231.nc
input4MIPs/CMIP6Plus/ScenarioMIP/ssp-new-1-15/CR/CR-1-1-3/atmos/day/mole-fraction-of-methane-in-air/gn/main/v20250201/mole-fraction-of-methane-in-air_input4MIPs_GHGConcentrations_ScenarioMIP_ssp-new-1-15_CR-1-1-_gn_main_20240101-22991231.nc
input4MIPs/CMIP6Plus/ScenarioMIP/ssp-new-2-43/CR/CR-1-1-3/atmos/day/mole-fraction-of-methane-in-air/gn/main/v20250201/mole-fraction-of-methane-in-air_input4MIPs_GHGConcentrations_ScenarioMIP_ssp-new-2-43_CR-1-1-_gn_main_20240101-22991231.nc
etc. for other variables
Some thoughts on the practical realities
Details
There are clearly tradeoffs here. I, personally, find the argument, "It's how we did it before", incredibly weak. We've done lots of things before and found lots of problems, not fixing the problems doesn't make much sense to me.
Having said that, I am aware that we're on a fast-track, so we need to help modelling groups as much possible. Hence, in the spirit of compromise, a suggestion:
I'm sure I haven't thought through everything, but here is something to get the conversation started.
Beta Was this translation helpful? Give feedback.
All reactions