-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow for significantly less memory usage in steps blending #435
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #435 +/- ##
=======================================
Coverage 83.89% 83.89%
=======================================
Files 160 160
Lines 12900 12902 +2
=======================================
+ Hits 10822 10824 +2
Misses 2078 2078
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@sidekock I realized yesterday that, with these changes, we can run pysteps a lot cheaper at KNMI. I have tested these changes and it all still seems to run (and fast enough). Maybe you can take these changes along when you start on the refactoring. |
Hi @mats-knmi, this looks great, many thanks for the great improvement! Do I understand it well that the code now prefers to work with the NWP rainfall data instead of the decomposed data (whereas this used to be the other way around)? With other words, do we still want to support decomposed NWP data as input? It would be a breaking change, so this needs some discussion, but it would make the code cleaner if it is not necessary anymore. |
In addition, I think we should update the function arguments a bit (see below), where we then would switch around the array shape and description (favoring the four dimension without decomposition over the two dimensions with decomposition) and indicating that float32 is supported and even recommended for memory usage.
|
Yes indeed, personally I would prefer to work with rainfall data in stead of decomposed data. However this might not be the case as much if you have less NWP model members (like RMI has), since then the decomposed data is not as big as it is for the KNMI usecase and the small extra performance you get by pre-decomposing it might be worth it. So I would be in favor of leaving both options in there, but I will see if I can try to word this nicely in the docstring. |
@RubenImhoff I have updated the docstring, let me know if it is ok like this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! I've added a few comments; after that, I think we are good to go. :)
Co-authored-by: Ruben Imhoff <31476760+RubenImhoff@users.noreply.github.com>
@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this. |
fixed it temporarily by pinning an older version of micromamba |
With the additions, we may have to add some tests for the coverage. But other than that, I think we're good to go. :) Nice work! |
@mats-knmi @RubenImhoff I am planning to look through everything this afternoon. |
@mats-knmi, I have some questions before I go more in-depth:
|
All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mats-knmi I did my best to go through all the changes. I added quite a few comments, most of them are just clarifications for me.
precip_models_cascade_temp | ||
) | ||
done = True | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that you are looping over all members for each time until you find a member with rain? Are you just doing this change the infs with the minimal value in this field? Is this minimal value in the found field necessarily the minimal value of precipitation in any field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the most difficult part to get right (see also the extensive discussion Ruben and I had on this in this Pull Request). We need a zero value and apparently np.nanmin(precip)
or np.nanmin(precip_models)
are not correct, since the decomposed precip values are different than the not decomposed values. Therefore we need to decompose the model. To prevent just decomposing the entire model here (which would defeat the purpose of not pre-decomposing the model), we need to find a timestep and member that is representative for the entire model, decompose that and use the nanmin of that for the zero value. Or at least that is the only way I could see that we could realistically do this. I don't know how the decomposition works exactly, but I would expect that an image that has precipitation has a representative decomposition of which we could take the nanmin. This is based on the assumption that at some point somewhere in the domain there is no rain, but this assumption is made quite a bit in other parts of the pysteps code as well.
I would like to add that this code is supposed to solve the issue when precip_cascade
is completely filled with nan values, but it doesn't explicitly check if this is the case, which seems weird to me. I also don't know if it is possible to get a precip_cascade
value that is completely filled with nans, if you give a sensible value in precip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this looks like an unfinished problem. I indeed see where this comes from, in an operational product, you should be ready for no radar data, but I feel like this is insufficient because, as far as I understand, this can give a lot of bias (a large precip field will give different min for the different cascades compared to a very intense convection event I would assume, please correct me if I am wrong here @RubenImhoff ). On the other hand, you do not expect the NWP to have huge variations in the 'weather type', so maybe it is fine. The only thing I worry about is that a very intense event with no other rain might give strange artefacts. I assume you used the same code as in "_fill_nans_infs_nwp_cascade" before?
I feel like we at least need to explore a full timestep before taking a min but that might be to computationally intensive (although we could also paralalize this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is actually the only part of the code that has some possible functional change. Since before at this point always the entire model field would have been decomposed, you could just take the nanmin over the entire decomposed forecast. But now this is not possible. Computationally it would indeed be less than ideal to decompose the entire forecast at this point in the code since the new changes were supposed to prevent this.
However, if this code is supposed to ONLY handle the case where the input (decomposed) precip field is completely filled with nans, I believe we should change the if statement here to check for that in stead of just checking for zero_precip_radar
. If we do that, then I believe this case will not show up often (or at all) operationally, so it is not a big deal if it does take some extra time.
Also, this part of the code was never in _fill_nans_infs_nwp_cascade
before. This was an additional check that was done separately to that method. this is the part of the code this part is trying to replicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good for now but maybe it is a good thing to raise this as a separate issue?
For now I see three possible solutions:
- Leave as is, risking some possible issues due to early cutoff
- Processing the whole NWP every run (very ram intensive and reduces computational efficiency
- Pre-process NWP to get these no rain values once, and feed them into the model (can be used for multiple runs, depending on the frequency of the RUC).
I don't think we should fix this issue here but we should keep this noted somewhere that this can be optimised ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point, do you want to open an issue or should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we merge first and then create the issue on the master branch? Ill let you do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot merge, but I can create an issue. Can you merge or should @dnerini do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not merge either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can merge it
@@ -809,23 +825,80 @@ def forecast( | |||
if measure_time: | |||
starttime = time.time() | |||
|
|||
if precip_models_cascade is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think precip_models_cascade is ever initialized, al I wrong in this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 576 it is set to None and then on line 579 it is set to the input precip_models
if the ndim is not equal to 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that means if dims is not 4 (the already decomposed case), it has been defined. If it is the other case (raw NWP), it is non, and thus, this decomposed. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct.
So you are doing this decomposition on the code now per time step and this is only impacting 10s for an entire forecast (for how long is this nowcast run? eg how many timesteps?) or 10s per step? |
@sidekock 10 seconds longer over the entire forecast of 12 hours, with a 5 min timestep, so 144 timesteps. I believe this part might actually take like 20 seconds total, but the original |
Great, thanks! Good to have a benchmark, always easier to convince people that way :) |
By delaying the decomposition of the model data into cascade levels and allowing for model data to be delivered in float32, it seems to be possible to run the blending with almost 10x less memory than before in some tests I did.