Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement standardization of static features #96

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joeloskarsson
Copy link
Collaborator

@joeloskarsson joeloskarsson commented Dec 9, 2024

Describe your changes

This PR makes sure static features are standardized in the same way as state and forcing. As standard features are loaded directly in the model, they are not handled by the standardization in WeatherDataset. This makes sure that they are still standardized.

The standardization is achieved by introducing an optional argument standardize in the get_dataarray Datastore method, optionally returning the dataarray standardized. The motivation is to leave as much of the control of the data loading to the Datastore class. It is also good to have this re-usable approach, as the same standardization method can then later be used for loading static fields in the boundary region.

In the future also other types of rescaling (e.g. to [0,1]) might be desirable for static features as well as making standardization optional. However, as that similarly applies to state and forcing that is a separate issue.

Issue Link

This closes #95.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@leifdenby
Copy link
Member

The standardization is achieved by introducing an optional argument standardize in the get_dataarray Datastore method, optionally returning the dataarray standardized. The motivation is to leave as much of the control of the data loading to the Datastore class. It is also good to have this re-usable approach, as the same standardization method can then later be used for loading static fields in the boundary region.

Good! Yes, I agree putting this functionality in the datastores makes sense so that it can be applied to static features too. Does that mean we should remove the standardisation from WeatherDataset? E.g. https://github.com/joeloskarsson/neural-lam-dev/blob/standardize_static/neural_lam/weather_dataset.py#L399

@joeloskarsson
Copy link
Collaborator Author

Since the idea (and better solution) is to move the standardization of state and forcing to the GPU anyhow (#25 , #39), I don't think there is any need to do changes to that here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardize or rescale also static variables
2 participants