Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dust emissions #140

Open
znichollscr opened this issue Oct 22, 2024 · 10 comments
Open

Add dust emissions #140

znichollscr opened this issue Oct 22, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@znichollscr
Copy link
Collaborator

znichollscr commented Oct 22, 2024

@jfkok has done some great work pulling together dust emissions. This issue is for tracking their inclusion in input4MIPs.

@znichollscr znichollscr added the enhancement New feature or request label Oct 22, 2024
@znichollscr
Copy link
Collaborator Author

An initial draft of the files is here, which includes a nice briefing on how to use the data.

My initial comments on this are:

  • the way the files are currently written, the scenario (historical, future increase, future decrease, future constant) and the region (global or regional) are both blended into the variable name. That isn’t how most of the data is handled. The more common pattern for this (where the variable is the same every time if I am not mistaken) would be to use the same variable name for every file (e.g. dustscalefactor) and then the region of application is identified by the grid label. It’s a bit weird, but the current practice is to identify the scenario by the source ID (although I think we could do something clearer than this, full discussion is here: Altering the DRS #64).
  • is there a reason that future starts in 2001? That will be different to how ScenarioMIP will be handling the split buf it this is for a custom experiment, it may not matter (but it will matter if we want these dust emissions included in the DECK’s historical experiment)
  • the data does present a bit of a challenge for input4MIPs (the same comments apply to Stephanie’s data and it would be great to address them for CMIP7 (in CMIP6 there unfortunately wasn’t time)). The data has this idea of regions to which the data applies. The CF conventions do have a standard for this (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#geographic-regions), which is probably what we would want to use. I think that would then all just work, but it might require a bit of a tweak to how the data is written(and the grid label for regional stuff would probably then become “gr”, see the full list of options here: https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_grid_label.json).
  • Just to check my understanding: the variable is the same every time right? It’s a scaling factor and the idea is that models use this to scale their internally calculated dust to get closer to the change in dust over time?

@znichollscr
Copy link
Collaborator Author

@jfkok making sure you get a notification and can find this

@jfkok
Copy link

jfkok commented Oct 24, 2024

Hi Zeb, thanks for your very helpful input. Responses follow below:

* the way the files are currently written, the scenario (historical, future increase, future decrease, future constant) and the region (global or regional) are both blended into the variable name. That isn’t how most of the data is handled. The more common pattern for this (where the variable is the same every time if I am not mistaken) would be to use the same variable name for every file (e.g. dustscalefactor) and then the region of application is identified by the grid label. It’s a bit weird, but the current practice is to identify the scenario by the source ID (although I think we could do something clearer than this, full discussion is here: [Altering the DRS #64](https://github.com/PCMDI/input4MIPs_CVs/discussions/64)).

Ah okay, that makes sense. So I'd need to create one file for the global scaling factor (currently in this file) and seven additional files for the regional scaling factors for the seven regions (currently in this file), is that correct? And then do I still keep the historical and the (three) future scaling factors in separate files? So for 4x8 = 32 files total?

* is there a reason that future starts in 2001? That will be different to how ScenarioMIP will be handling the split buf it this is for a custom experiment, it may not matter (but it will matter if we want these dust emissions included in the DECK’s historical experiment)

Yes, the reason is that the observational reconstruction ends in the year 2000 because it is based on sedimentary records of dust deposition (like ice cores), so there's much less data for the last two decades. However, my group is working on using satellite data to extend the reconstruction to 2023 and I expect that to be ready sometime next year.

* the data does present a bit of a challenge for input4MIPs (the same comments apply to Stephanie’s data and it would be great to address them for CMIP7 (in CMIP6 there unfortunately wasn’t time)). The data has this idea of regions to which the data applies. The CF conventions do have a standard for this (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#geographic-regions), which is probably what we would want to use. I think that would then all just work, but it might require a bit of a tweak to how the data is written(and the grid label for regional stuff would probably then become “gr”, see the full list of options here: https://github.com/PCMDI/mip-cmor-tables/blob/main/MIP_grid_label.json).

Thanks for pointing this out. The region do have well-defined coordinates though (the link you included mentioned "complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates"). Would defining those boundaries in each file be sufficient, or do I need to do something else?

* Just to check my understanding: the variable is the same every time right? It’s a scaling factor and the idea is that models use this to scale their internally calculated dust to get closer to the change in dust over time?

Yes, that's exactly right. It's only the year and the region of application (global versus one of seven major dust aerosol source regions) that changes.

Thanks so much!

Jasper

@znichollscr
Copy link
Collaborator Author

And then do I still keep the historical and the (three) future scaling factors in separate files? So for 4x8 = 32 files total?

Sorry bad explanation from me. I would suggest one file for global and one file for regional (you can put all seven regions in one file, just use the 'region' dimension or whatever it is that the CF conventions calls it to differentiate them). Then yes, one file for historical and one file for each scenario. So you'll end up with 4 x 2 = 8 files.

Just to try and be a bit clearer:

  • in the global file, there should be one variable. Its only dimension should be "time". The grid label should be "gm" (for global mean).
  • in the regional file, there should be one variable. It should be two-dimensional, ("time", "region") (or whatever the CF-convention name for 'region' is). I would make the grid label "gn" probably (this is your native grid). Then there'll probably be other auxillary co-ordinates in this file to help with defining the boundaries of each region, region names etc.

Got it re the historical vs. scenario split. That's fine. If we want these files to be used for DECK simulations, we'll have to do a bit of thinking. If they're just for a specific MIP experiment, they can stay as they are.

The region do have well-defined coordinates though (the link you included mentioned "complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates"). Would defining those boundaries in each file be sufficient, or do I need to do something else?

Ah ok nice. Defining those boundary files would definitely be sufficient. (If it were me, I would just give the regions names first, make sure I can write the files, then go back and add the boundary definition second, because that boundary definition could be fiddly, but you may be able to skip straight to writing the boundary conditions!)

@jfkok
Copy link

jfkok commented Oct 24, 2024

Thanks, that's helpful. I'm working on implementing these changes and obtaining corrected files. In doing so, I realized that making the variable name the same for all files also means that the variable are identical for the four different global files (1 historical and 3 future scenarios) and for the four different regional files. So how would I distinguish the files if I can't put the scenario in the variable name?

The region do have well-defined coordinates though (the link you included mentioned "complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates"). Would defining those boundaries in each file be sufficient, or do I need to do something else?

Ah ok nice. Defining those boundary files would definitely be sufficient. (If it were me, I would just give the regions names first, make sure I can write the files, then go back and add the boundary definition second, because that boundary definition could be fiddly, but you may be able to skip straight to writing the boundary conditions!)
I'm not sure I quite understand what you mean by "boundary files". Would it be sufficient to specify the coordinates of the region boundaries as a "boundary coordinates" attribute of the "region" variable? Or should I do something different?

Thanks!

Jasper

@znichollscr
Copy link
Collaborator Author

So how would I distinguish the files if I can't put the scenario in the variable name?

Excellent question. The answer is, at the moment, you put it in the "source_id" bit of the file name. So, for example, your filenames would become (where I've also dropped off the noleap prefix that isn't part of the DRS):

  • historical: dustscaling_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_gm_185001-200012.nc
  • constant scenario: dustscaling_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-constant_gm_200101-210012.nc
  • decreasing scenario: dustscaling_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-decreasing_gm_200101-210012.nc
  • future scenario: dustscaling_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-increasing_gm_200101-210012.nc

As you can tell, this doesn't make that much sense and is easy to miss, which is why we're having the discussion in #64. That may lead to a renaming in future, but for now the above is what to go for.

@jfkok
Copy link

jfkok commented Oct 25, 2024

Thanks so much! I've implemented all your comments (I think) and uploaded the updated files here. Let me know if you think any further changes are needed.

One thing to note is that I added the coordinates of the region boundaries as a "boundary coordinates" attribute of the "region" variable. Let me know in case I should be doing something differently.

@znichollscr
Copy link
Collaborator Author

Nice, thanks.

Let me know if you think any further changes are needed.

Underscores in the source ID have to be changed to hyphens i.e.:

  • dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_constant_gm_200101-210012.nc -> dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-constant_gm_200101-210012.nc
  • dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_increasing_gm_200101-210012.nc -> dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-increasing_gm_200101-210012.nc
  • dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_decreasing_gm_200101-210012.nc -> dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1-decreasing_gm_200101-210012.nc

Then I would suggest trying to run them through the validator and write them in the DRS: https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-write-a-single-file-in-the-drs/

One thing to note is that I added the coordinates of the region boundaries as a "boundary coordinates" attribute of the "region" variable. Let me know in case I should be doing something differently.

Not sure, I haven't done this particular step before. If you run the files through the validator, the CF-checker will flag anything really wrong. In a couple of weeks I can pull the files down and have a look myself (meeting next week will take up time before then).

@jfkok
Copy link

jfkok commented Oct 28, 2024

Thanks Zeb, I corrected the file names.

I tried running the validator after installing it as an application per the instructions here. However, the command import input4mips_validation.io triggers an error message, pasted below. Do you know if there is an easy solution for this? Thanks!

image

@znichollscr
Copy link
Collaborator Author

Do you know if there is an easy solution for this?

Hmm that's not very good. Let's dive down the rabbit hole here: climate-resource/input4mips_validation#78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants