-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a first version #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments. Most of it in this PR in individual commits. Cherry-pick what you deem worthwhile
priority = "default" | ||
|
||
[[tool.poetry.source]] | ||
name = "pypi-mch-publish" | ||
url = "https://hub.meteoswiss.ch/nexus/repository/python-mch/" | ||
url = "https://service.meteoswiss.ch/nexus/repository/python-mch/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is that supposed to work?
service building it at CSCS is somehow required. hub is what the labvm wants?
How does building it on jenkins, where we need nexus.mch. work planned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an attempt to make it work:
https://github.com/MeteoSwiss-APN/cosmo-archive-retrieve/pull/1/files#diff-e6ffa5dc854b843b3ee3c3c28f8eae2f436c2df2b1ca299cca1fa5982e390cf8R52
I hope this is temporary, in some discussions it was suggested that the url might be unified in a near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, perfect
|
||
def main(): | ||
|
||
prefix = os.environ["CONDA_PREFIX"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this already fails if no CONDA_PREFIX is defined. Should that be caught?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can, but what would you do ? Raise another custom exception?
The script was meant to install in the share path of the conda prefix, so letting fail seems right in case this can not be done ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next line you raise a custom exception saying the prefix does not exist if it exists and does not point to a directory. Why not be more descriptive than a key error here as well?
poetry.lock
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say what is wrong, but when I try to build the envionment, I get stuck on:
lab-vm ../cosmo-archive-retrieve first_review ◔ cosmo-archive-retrieve poetry install
Warning: Found deprecated priority 'default' for source 'pypi-mch' in pyproject.toml. You can achieve the same effect by changing the priority to 'primary' and putting the source first.
Installing dependencies from lock file
Package operations: 40 installs, 3 updates, 0 removals
- Installing pydantic-core (2.16.3): Pending...
- Installing snowballstemmer (2.2.0): Pending...
- Installing snuggs (1.4.7): Pending...
- Installing soupsieve (2.5): Pending...
- Installing sphinxcontrib-applehelp (1.0.8): Pending...
- Installing sphinxcontrib-devhelp (1.0.6): Pending...
- Installing sphinxcontrib-htmlhelp (2.0.5): Pending...
- Installing sphinxcontrib-jsmath (1.0.1): Pending...
ds = idpi.grib_decoder.load( | ||
idpi.data_source.DataSource(datafiles=[full_path]), | ||
{ | ||
"param": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we somehow don't get PP here. How is that handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, very good point. I think the list came from the original pressure levels extraction, which do not require P. I added it
ds = xr.open_zarr(data_path) | ||
checkpoint = ds.time.values[-1] | ||
|
||
return datetime.utcfromtimestamp(checkpoint.tolist() / 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is deprecated
|
||
|
||
def load_data(config: dict) -> None: | ||
"""Load weather data from archive and store it in a Zarr archive.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def load_data(config: dict) -> None: | ||
"""Load weather data from archive and store it in a Zarr archive.""" | ||
|
||
n_pool = config["n_pool"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems out of place here
"""Load weather data from archive and store it in a Zarr archive.""" | ||
|
||
n_pool = config["n_pool"] | ||
tar_file_paths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we in a comment describe what we're expecting here? With the list containing the files and having the same name as the tar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, since was not very clear it seems, I thought we could better move it into its own function. Let me know, if that improves...
pdset["RELHUM"] = relhum( | ||
pdset["QV"], pdset["T"], pdset["P"], clipping=True, phase="water" | ||
) | ||
pdset["FI"] = pdset["HFL"] * 9.80665 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we computing FI? seems to not be in the dataset we're using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, originally it was extracted since it was requested by FCN. But indeed FI is just height times g, but we already have HFL, so it does not add anything. I will remove it
scripts/slurm-10162269.out
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be here
* isdir instead * remove deprecated * update suffix * use path instead of basename * renaming * remove unused arg * move npool to usage * returnvalue
Thanks @cosunae! Some questions and a requests for additional variables. I looked at the zarr archive on Tsa:
With regards to the NeuralLAM paper and some discussions with my professor I think we might want to add additional variables. Using this document we are trying to align the variables of ERA5 with our regional datasets. Could we also retrieve the following variables (eccodes param ids)?
This is my initial suggestion for additional retrievals, I think we should discuss here or in our ML-sync. |
I have one additional comment wrt collaboration within the community. It might be worthwile to look at the following GRIB2Zarr libraries and maybe try to consolidate: |
In case you are looking for multi-thread, high-performance zarr creation I have written a simple script using xr.open_mfdataset and SLURMCluster using Dask: |
Thanks @sadamov for the comments. I have pushed few commits to address them:
Additional changes since the last commit reviewed:
The following, we would need to compute them (so not yet added): divergence, vorticity |
I completely agree that we should look into anemoi. I think this repo we only need it for generating data as long as we need it. As soon as we have a strategy of how to be aligned with anemoi, we should discuss how to proceed. |
Thanks a lot @cosunae, this is great!
|
Summarizing our discussion in the VC:
|
thanks @sadamov. I think though I still managed to add confusion mixing the P vs PP story and HFL vs FI (I think I was mixing HFL and P during the meeting). I try to clarify:
|
I added wind gusts (VMAX_10M, UMAX_10M) and DURSUN from the first guess, |
It makes sense to me, thanks for the explanations and the fast updates 🚀 . If reviewer agrees we can merge. The model/dataloader is also ready for the new single_zarr archive for a first full training MeteoSwiss/neural-lam#12 |
Concerning the 2m dew point T, TD_2M, it can be computed from QV_2M, T_2M and PS: But we do not have QV_2M in the analysis, so I added TD_2M, ok with that also ? |
|
||
def main(): | ||
|
||
prefix = os.environ["CONDA_PREFIX"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next line you raise a custom exception saying the prefix does not exist if it exists and does not point to a directory. Why not be more descriptive than a key error here as well?
priority = "default" | ||
|
||
[[tool.poetry.source]] | ||
name = "pypi-mch-publish" | ||
url = "https://hub.meteoswiss.ch/nexus/repository/python-mch/" | ||
url = "https://service.meteoswiss.ch/nexus/repository/python-mch/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, perfect
tar.extractall(path=os.path.join(tarextract_dir, "ANA")) | ||
|
||
fg_file = re.sub(r"KENDA\/ANA(\d{2})", r"KENDA/FG\1", ana_file) | ||
shutil.copyfile(fg_file, tmp_tar_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok like this. I think because it is only temporary it works. Just wanted to double-check
"PMSL", | ||
"FI", | ||
"HSURF", | ||
"PP", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we still write out pp and p0fl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was a confusion we had last time we discussed,
After clarification:
we need either pp and p0fl or P.
And either HFL or FI.
So the decision was to go for PP & P0FL and FI
From my side this PR can be merged.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to be in a mergeable state now! Thanks a lot
No description provided.