Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for significantly less memory usage in steps blending #435

Merged
merged 19 commits into from
Oct 7, 2024

Conversation

mats-knmi
Copy link
Contributor

@mats-knmi mats-knmi commented Sep 24, 2024

By delaying the decomposition of the model data into cascade levels and allowing for model data to be delivered in float32, it seems to be possible to run the blending with almost 10x less memory than before in some tests I did.

Copy link

codecov bot commented Sep 24, 2024

Codecov Report

Attention: Patch coverage is 97.82609% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.89%. Comparing base (f211df2) to head (5c14399).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
pysteps/blending/steps.py 97.82% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #435   +/-   ##
=======================================
  Coverage   83.89%   83.89%           
=======================================
  Files         160      160           
  Lines       12900    12902    +2     
=======================================
+ Hits        10822    10824    +2     
  Misses       2078     2078           
Flag Coverage Δ
unit_tests 83.89% <97.82%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mats-knmi
Copy link
Contributor Author

@sidekock I realized yesterday that, with these changes, we can run pysteps a lot cheaper at KNMI. I have tested these changes and it all still seems to run (and fast enough). Maybe you can take these changes along when you start on the refactoring.

@RubenImhoff
Copy link
Contributor

Hi @mats-knmi, this looks great, many thanks for the great improvement! Do I understand it well that the code now prefers to work with the NWP rainfall data instead of the decomposed data (whereas this used to be the other way around)? With other words, do we still want to support decomposed NWP data as input? It would be a breaking change, so this needs some discussion, but it would make the code cleaner if it is not necessary anymore.

@RubenImhoff
Copy link
Contributor

In addition, I think we should update the function arguments a bit (see below), where we then would switch around the array shape and description (favoring the four dimension without decomposition over the two dimensions with decomposition) and indicating that float32 is supported and even recommended for memory usage.

    precip_models: array-like
      Array of shape (n_models,timesteps+1) containing, per timestep (t=0 to
      lead time here) and per (NWP) model or model ensemble member, a
      dictionary with a list of cascades obtained by calling a method
      implemented in :py:mod:`pysteps.cascade.decomposition`. In case of one
      (deterministic) model as input, add an extra dimension to make sure
      precip_models is five dimensional prior to calling this function.
      It is also possible to supply the original (NWP) model forecasts containing
      rainfall fields as an array of shape (n_models,timestep+1,m,n), which will
      then be decomposed in this function. Note that for an operational application
      or for testing with multiple model runs, it is recommended to decompose
      the model forecasts outside beforehand, as this reduces calculation times.
      This is possible with :py:func:`pysteps.blending.utils.decompose_NWP`,
      :py:func:`pysteps.blending.utils.compute_store_nwp_motion`, and
      :py:func:`pysteps.blending.utils.load_NWP`.

@mats-knmi
Copy link
Contributor Author

Yes indeed, personally I would prefer to work with rainfall data in stead of decomposed data. However this might not be the case as much if you have less NWP model members (like RMI has), since then the decomposed data is not as big as it is for the KNMI usecase and the small extra performance you get by pre-decomposing it might be worth it. So I would be in favor of leaving both options in there, but I will see if I can try to word this nicely in the docstring.

@mats-knmi mats-knmi requested a review from sidekock September 24, 2024 15:29
@mats-knmi
Copy link
Contributor Author

@RubenImhoff I have updated the docstring, let me know if it is ok like this

Copy link
Contributor

@RubenImhoff RubenImhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far! I've added a few comments; after that, I think we are good to go. :)

pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
@mats-knmi
Copy link
Contributor Author

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

@dnerini
Copy link
Member

dnerini commented Sep 30, 2024

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

fixed it temporarily by pinning an older version of micromamba

@RubenImhoff
Copy link
Contributor

With the additions, we may have to add some tests for the coverage. But other than that, I think we're good to go. :) Nice work!

@sidekock
Copy link
Contributor

sidekock commented Oct 3, 2024

@mats-knmi @RubenImhoff I am planning to look through everything this afternoon.

@sidekock
Copy link
Contributor

sidekock commented Oct 3, 2024

@mats-knmi, I have some questions before I go more in-depth:

  • I assume both options (providing raw NWP and decomposed) are still fully supported?
  • Are the performance increases introduced now also applicable to the old method of decomposing first?
  • Has there been any analysis as to the impact on final nowcast of the 32bit vs 64bit data?
  • Is there a decrease in performance (time for a forecast)? I would expect this because now the computations are done during the run...

@mats-knmi
Copy link
Contributor Author

@sidekock

  • Yes both options should still work as intended. (The unit tests succeed for both options).
  • The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).
  • No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.
  • Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

Copy link
Contributor

@sidekock sidekock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mats-knmi I did my best to go through all the changes. I added quite a few comments, most of them are just clarifications for me.

pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Outdated Show resolved Hide resolved
precip_models_cascade_temp
)
done = True
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that you are looping over all members for each time until you find a member with rain? Are you just doing this change the infs with the minimal value in this field? Is this minimal value in the found field necessarily the minimal value of precipitation in any field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the most difficult part to get right (see also the extensive discussion Ruben and I had on this in this Pull Request). We need a zero value and apparently np.nanmin(precip) or np.nanmin(precip_models) are not correct, since the decomposed precip values are different than the not decomposed values. Therefore we need to decompose the model. To prevent just decomposing the entire model here (which would defeat the purpose of not pre-decomposing the model), we need to find a timestep and member that is representative for the entire model, decompose that and use the nanmin of that for the zero value. Or at least that is the only way I could see that we could realistically do this. I don't know how the decomposition works exactly, but I would expect that an image that has precipitation has a representative decomposition of which we could take the nanmin. This is based on the assumption that at some point somewhere in the domain there is no rain, but this assumption is made quite a bit in other parts of the pysteps code as well.

I would like to add that this code is supposed to solve the issue when precip_cascade is completely filled with nan values, but it doesn't explicitly check if this is the case, which seems weird to me. I also don't know if it is possible to get a precip_cascade value that is completely filled with nans, if you give a sensible value in precip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this looks like an unfinished problem. I indeed see where this comes from, in an operational product, you should be ready for no radar data, but I feel like this is insufficient because, as far as I understand, this can give a lot of bias (a large precip field will give different min for the different cascades compared to a very intense convection event I would assume, please correct me if I am wrong here @RubenImhoff ). On the other hand, you do not expect the NWP to have huge variations in the 'weather type', so maybe it is fine. The only thing I worry about is that a very intense event with no other rain might give strange artefacts. I assume you used the same code as in "_fill_nans_infs_nwp_cascade" before?
I feel like we at least need to explore a full timestep before taking a min but that might be to computationally intensive (although we could also paralalize this)

Copy link
Contributor Author

@mats-knmi mats-knmi Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is actually the only part of the code that has some possible functional change. Since before at this point always the entire model field would have been decomposed, you could just take the nanmin over the entire decomposed forecast. But now this is not possible. Computationally it would indeed be less than ideal to decompose the entire forecast at this point in the code since the new changes were supposed to prevent this.

However, if this code is supposed to ONLY handle the case where the input (decomposed) precip field is completely filled with nans, I believe we should change the if statement here to check for that in stead of just checking for zero_precip_radar. If we do that, then I believe this case will not show up often (or at all) operationally, so it is not a big deal if it does take some extra time.

Also, this part of the code was never in _fill_nans_infs_nwp_cascade before. This was an additional check that was done separately to that method. this is the part of the code this part is trying to replicate.

Copy link
Contributor

@sidekock sidekock Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good for now but maybe it is a good thing to raise this as a separate issue?
For now I see three possible solutions:

  1. Leave as is, risking some possible issues due to early cutoff
  2. Processing the whole NWP every run (very ram intensive and reduces computational efficiency
  3. Pre-process NWP to get these no rain values once, and feed them into the model (can be used for multiple runs, depending on the frequency of the RUC).

I don't think we should fix this issue here but we should keep this noted somewhere that this can be optimised ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, do you want to open an issue or should I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we merge first and then create the issue on the master branch? Ill let you do it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot merge, but I can create an issue. Can you merge or should @dnerini do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not merge either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can merge it

@@ -809,23 +825,80 @@ def forecast(
if measure_time:
starttime = time.time()

if precip_models_cascade is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think precip_models_cascade is ever initialized, al I wrong in this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 576 it is set to None and then on line 579 it is set to the input precip_models if the ndim is not equal to 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means if dims is not 4 (the already decomposed case), it has been defined. If it is the other case (raw NWP), it is non, and thus, this decomposed. Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct.

pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Show resolved Hide resolved
pysteps/blending/steps.py Show resolved Hide resolved
@sidekock
Copy link
Contributor

sidekock commented Oct 3, 2024

@sidekock

* Yes both options should still work as intended. (The unit tests succeed for both options).

* The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).

* No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.

* Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

So you are doing this decomposition on the code now per time step and this is only impacting 10s for an entire forecast (for how long is this nowcast run? eg how many timesteps?) or 10s per step?

@mats-knmi
Copy link
Contributor Author

@sidekock 10 seconds longer over the entire forecast of 12 hours, with a 5 min timestep, so 144 timesteps. I believe this part might actually take like 20 seconds total, but the original _compute_cascade_decomposition_nwp, would also take 10 seconds in the old code, so bottom line is we lose around 10 seconds total.

@sidekock
Copy link
Contributor

sidekock commented Oct 7, 2024

@sidekock 10 seconds longer over the entire forecast of 12 hours, with a 5 min timestep, so 144 timesteps. I believe this part might actually take like 20 seconds total, but the original _compute_cascade_decomposition_nwp, would also take 10 seconds in the old code, so bottom line is we lose around 10 seconds total.

Great, thanks! Good to have a benchmark, always easier to convince people that way :)
Thanks for answering all my questions, ill put my stamp on approval for this merge :) Then I can begin with the refactoring whenever it ends up in the master.

@dnerini dnerini merged commit 8521a53 into master Oct 7, 2024
9 checks passed
@dnerini dnerini deleted the memory-efficiency-blending branch October 7, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants