Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mhs/das 2216/quick fixes #19

Merged
merged 25 commits into from
Oct 30, 2024
Merged

Mhs/das 2216/quick fixes #19

merged 25 commits into from
Oct 30, 2024

Conversation

joeyschultz
Copy link
Collaborator

@joeyschultz joeyschultz commented Oct 17, 2024

Description

This PR updates the Swath Projector to add support for TEMPO Level 2 data.

Jira Issue ID

DAS-2216

Local Test Steps

Checkout this branch, build the docker images and run the test image.

./bin/build-image && ./bin/build-test && ./bin/run-test

Ensure the tests pass (one test is being skipped due to changes in this PR).

Ran 88 tests in 27.973s

OK (skipped=1)

Start up Harmony-In-A-Box. If you already have it running, reload Harmony-In-A-Box to pick up this latest image.

./bin/restart-services

Download the "TEMPO_O3TOT_L2_example.ipynb" notebook from DAS-2216 and run it to test the swath-projector in Harmony-In-A-Box. The notebook should run successfully.

Open the output file in Panoply and inspect it to ensure the geographic reprojection is correct.

Note: The following variables from the input are purposefully excluded in the varinfo config and will be missing from the output because their dimensions do not match those of /geolocation/longitude and /geolocation/latitude:

  • /support_data/a_priori_layer_o3
  • /support_data/cal_adjustment
  • /support_data/dNdR
  • /support_data/layer_efficiency
  • /support_data/lut_wavelength
  • /support_data/N_value
  • /support_data/N_value_residual
  • /support_data/ozone_sensitivity_ratio
  • /support_data/step_1_N_value_residual
  • /support_data/step_2_N_value_residual
  • /support_data/temp_sensitivity_ratio

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wont approve or request changes since I'm leaving, but I did leave a bunch of comment and questions.

CHANGELOG.md Outdated
[v1.1.0]:(https://github.com/nasa/harmony-swath-projector/releases/tag/1.0.1)
[v1.0.1]:(https://github.com/nasa/harmony-swath-projector/releases/tag/1.0.1)
[v1.0.0]:(https://github.com/nasa/harmony-swath-projector/releases/tag/1.0.0)
[v1.1.1]: (https://github.com/nasa/harmony-swath-projector/releases/tag/1.1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to add your release here yourself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all I know this was me...but still

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 735894d

Comment on lines 97 to 100
logger.error(f'Cannot find "{dataset_file}".')
raise MissingReprojectedDataError(variable_name)
# QuickFix (DAS-2216) Ignore missing reprojections
logger.error(f'Not Including "{variable_name}" in "{output_file}".')
# raise MissingReprojectedDataError(variable_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this was me but I left this branch without cleaning it up.

I'd get rid of the logger.error line and the commented raise line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 735894d

else:
# Assumption: Array = (y, x)
return variable[:].filled(fill_value=fill_value)
return transpose_if_xdim_less_than_ydim(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if that assumption is still correct, same with L45

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a reasonable assumption (transpose to get xdim < ydim) as a default behavior. Follow-up ticket on this point is allow for a configuration entry to explicitly define the along-track vs across-track dimension. In that ticket I am proposing a swath feature metadata variable, similar to grid_mapping, that provides this definition - trying to move towards convergible metadata standards as discussed in the convergence tiger team.

Copy link
Member

@flamingbear flamingbear Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the line that says and if it's not true, it should be removed

        # Assumption: Array = (y, x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clarify the comment to "First assume Array = (along-track, across-track), but transpose if along-track.size < across-track.size. Similarly for line 45.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it more concise, I thought it would be best to modify the comments to:
line 45: # Assumption: Array = (time, along-track, across-track)
line 50: # Assumption: Array = (along-track, across-track)

Fixed in commit 78dd9df

@@ -62,7 +66,12 @@ def get_coordinate_variable(
if coordinate_substring in coordinate.split('/')[-1] and variable_in_dataset(
coordinate, dataset
):
return dataset[coordinate]
# QuickFix (DAS-2216) for short and wide swaths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment in the right place still? They are in the code so that they can be removed when the quick fixes are actual fixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 70-73 are all part of the quickfix so I think the comment is where we want it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually on the fence about this as I suspect these are essentially default behaviors and not necessarily going away with the quick fix. That said, there is at least the case of setting rows-per-scan = 2 that will need follow-up modification. We can go with it for now, but do need to follow-up.

transposed_variable = np.ma.transpose(variable).copy()
if hasattr(variable, 'mask'):
transposed_variable.mask = np.ma.transpose(variable[:].mask)
return transposed_variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this be deleted and just let the var be returned on L246?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 735894d

swath_projector/utilities.py Outdated Show resolved Hide resolved
tests/unit/test_nc_merge.py Outdated Show resolved Hide resolved
Comment on lines 402 to 413
def test_masked_array(self):
# Test case with a masked array
input_array = np.ma.array(
[[1, 2, 3], [4, 5, 6]], mask=[[True, False, False], [False, True, False]]
)
expected_output = np.ma.array(
[[1, 4], [2, 5], [3, 6]],
mask=[[True, False], [False, True], [False, False]],
)
result = transpose_if_xdim_less_than_ydim(input_array)
np.testing.assert_array_equal(result, expected_output)
np.testing.assert_array_equal(result.mask, expected_output.mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't I have this test in before you added the mask transpose that I was asking about above? I guess two questions, why didn't this fail before? And why does it still work with your changes to the function? (three questions, where's the test for the changes you made to the function)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test wasn't there, but the test get_variable_values was, and did fail until it was revised to handle the transpose. The transpose operation posed some challenges during development when an early implementation lost the mask, so this test was added to verify the transpose did not lose or change the mask, and that the mask is transposed along with the base data. This should be implicit in Numpy Masked-Array operations, but - as noted - it didn't always work as expected.

Copy link
Member

@flamingbear flamingbear Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test wasn't there

I'm asking about this branch that I handed off to Joey. If you look at commit 7622aa7, you will see that I added these tests including this masked one I'm asking about. So my questions still stand and I still wonder if the masks are being applied properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt you are correct that this test was existing and passing before I made the modifications in this commit 72223b9 to how the masks are applied after the transpose. I'm trying to sort out why in the early implementation the transpose operation did not lose the mask in this unit test but it was losing the mask for the coordinate variables when running the granule through the swath projector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the test failures, the updates seem reasonable. If we see a quick resolution and fix here, let's go for it, but otherwise, we know this implementation works, and we can go with that.

Copy link
Collaborator

@lyonthefrog lyonthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think the changes look clean! I added a few questions and comments with minor fix suggestions. I'm not terribly familiar with the pyresample library, so I couldn't give a super thorough review of the bits that use that.

I ran the units tests and they succeeded, along with the expected skipped test. I ran the TEMPO_O3TOT_L2_example.ipynb notebook, and the output is a little confusing to me. This is the first time I've looked at the Swath Projector though, so it's likely I just haven't fully grasped what it's doing yet.

input-output-tempo

I know one of the AC is "Lat/Lon coordinates are configured and understood to be located in the /geolocation group," but I'm confused as to why the coordinate variables were all removed from geolocation with lat/lon-looking datasets added to the root directory. Should this request have moved/deleted/added variables like this?

Also, I wasn't sure how to check whether or not the output was successfully re-projected, but I plotted geolocation/solar_azimuth_angle. This is expected?

input-output-tempo-plot

docs/TEMPO_O3TOT_L2_example.ipynb Outdated Show resolved Hide resolved
docs/TEMPO_O3TOT_L2_example.ipynb Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Show resolved Hide resolved
swath_projector/exceptions.py Outdated Show resolved Hide resolved
swath_projector/utilities.py Outdated Show resolved Hide resolved
tests/unit/test_utilities.py Show resolved Hide resolved
tests/unit/test_utilities.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
bin/project_local_granule.py Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally - nice work. In particular, I really appreciate the additional unit tests that really kick the tyres of the new function you added.

The changes I'm requesting here:

  • Primarily: The configuration rule needs to be tightened up - it's applicable to all variables in the /support_data, but not all of those variables have the same dimensions as the coordinate variables. (You may need to override the coordinates for those variables, too, to set them to None, so that the Swath Projector doesn't think those are science variables).
  • After doing the above, try re-enabling the removed MissingReprojectedDataError. Tweaking the configuration file rule might mean that things work out, but if they don't we need to know how to reenable that exception and associated test, and document that in the test suite. (We also need to be planning what to do get it back to normal before completing this feature).
  • Simplifying the use of the np.ma.transpose - I'm only messing around locally, but it looks like it can be a lot simpler.

bin/project_local_granule.py Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Outdated Show resolved Hide resolved
swath_projector/nc_merge.py Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Outdated Show resolved Hide resolved
tests/unit/test_utilities.py Show resolved Hide resolved
tests/unit/test_utilities.py Outdated Show resolved Hide resolved
tests/unit/test_utilities.py Show resolved Hide resolved
tests/unit/test_utilities.py Show resolved Hide resolved
docs/TEMPO_O3TOT_L2_example.ipynb Outdated Show resolved Hide resolved
@owenlittlejohns
Copy link
Member

Also, I just spotted - there are some pre-commit failures for your PR: (see the link here). Do you have pre-commit configured locally? If not there are some instructions in the README.md.

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I think this is pretty close.

I like the ExcludedScienceVariables approach for the configuration file - that definitely saves the creation of a really complicated regular expression.

I've left another couple of comments on the whole transposing thing (including that the tests need to tweak their inputs/outputs a bit). The only other thing I had was a tiny nit (that you can ignore if you think I'm being overly verbose).

swath_projector/utilities.py Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Show resolved Hide resolved
tests/unit/test_nc_merge.py Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Outdated Show resolved Hide resolved
swath_projector/earthdata_varinfo_config.json Outdated Show resolved Hide resolved
tests/unit/test_utilities.py Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning - mixed bag ahead!

Good stuff:

  • The code changes look good. Thanks for bearing with the rabbit hole with the transpose stuff!
  • I built the service image locally, and the unit tests all passed.
  • I stood up my local Harmony instance, ran the notebook from DAS-2216 and the TEMPO L2 Ozone request completed successfully.

Interesting(?) stuff:

  • The output file from the notebook on DAS-2216 has three unexpected variables: /geolocation/lat_1, /geolocation/lon_1 and /geolocation/latitude_longitude_lat_1_lon_1 (the last one is a grid mapping variable). None of the variables in the file refer to these 1-D dimension variables or grid mapping variable, so they seem to be things that shouldn't be in the output. Looking at the Docker logs, it looks like the calculations for resolution etc are happening for /product/quality_flag and /support_data/ground_pixel_quality_flag - but both use the same coordinate in the source file, and they end up using lat and lon in their output, so I don't know quite why this is happening without digging in a bit more.

Worrying stuff:

Reprojection failed with error: No variables could be reprojected

Digging in to the Docker logs, I see something more helpful (this is from interpolation.py L163, which is called from interpolation.py, L78):

ValueError: EWA requires 2 or more rows_per_scan and must be a factor of the total number of input row

I think what this is likely telling us is that for arrays with an odd number of rows, this change is going to fail.

@D-Auty
Copy link

D-Auty commented Oct 28, 2024

I think the quick fix on rows-per-scan = 2 causing failures for dimension extents that are not even is to implement a least-integral-divisor function (see following) and use that to determine rows-per-scan. I suspect a better answer is to allow rows-per-scan = 1, and artificially extend the rows by e.g., duplicating the last row if necessary - but this requires some experimentation to see if it is acceptable in all cases. (I'm surprised PyResample doesn't offer a built-in solution for rows-per-scan = 1, since this would be a very common use-case). Experimentation to date has shown that using 2 for TEMPO data (which is even) has given better results. A minimal acceptable value for rows-per-scan seems appropriate.

def least_integral_divisor(n):
    """
    Finds the least integral divisor of a number n.
    """
    if n < 2:
        return 1  # 1 is the least divisor for 1 and 0
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return i
    return 0  # If no divisor is found, return 0 to imply: "use all rows"

@owenlittlejohns
Copy link
Member

owenlittlejohns commented Oct 28, 2024

Okay - it sounds like we have a couple of options.

  • Leave the previous implementation for all rows per scan, including for TEMPO data. This looks to have poorer results from David's testing. (So not preferred)
  • Use the function David suggests above to get the smallest divisor. But, I would recommend changing the final return value is the number itself rather than zero, e.g., defaulting to all rows per scan. That would maybe mean the function would be better named as something like get_rows_per_scan. (Minor nitpick: Python style guides like PEP008 say to use more verbose variable names, I think 3 characters is the recommended minimum length, so n and i would need to be renamed.)

That leaves a couple of things:

  • The extraneous 1-D grid dimensions and grid mapping variables in the output (/geolocation/lat_1, /geolocation/lon_1 and /geolocation/latitude_longitude_lat_1_lon_1). What's the thinking for those?
  • Using the second option above (finding the minimum factor that is 2 or more), which sounds good to me, the reference files in the Swath Projector regression tests may need updating. Obviously, that's a separate PR, but I don't want to lose track of that as a thing to be done.

@D-Auty
Copy link

D-Auty commented Oct 28, 2024

The extra lat/lon/grid-mapping (_1) may be quite relevant in other use cases, e.g., (if SMAP were a swath, when I did some testing) - the global and polar datasets with different grid-mapping). So I am not clear on why we did not get references to those extra lat/lon and grid-mapping and/or did get the extra variables when not being referenced. More research is required. In the mean time can we merge this PR with a known issue - as it is not failing and the outputs are valid if not quite as desired.

@owenlittlejohns
Copy link
Member

can we merge this PR with a known issue

I think that answer would probably have to come from the TRADE team saying whether they think the output is fine as it is. The additional grid mapping and 1-D dimension variables don't seem to be doing any harm, it's just that there's no real reason for them to be there. I could be convinced to merge things with those extra variables being produced, but we should probably pop a ticket in the cloud backlog to analysis/fix this.

@joeyschultz
Copy link
Collaborator Author

The latest commit bde29b6 implements the get_rows_per_scan function. I ran the Swath Projector Regression Test notebook and it is now completing the request, but failing on the comparison to the reference file, as Owen expected.

@joeyschultz
Copy link
Collaborator Author

joeyschultz commented Oct 28, 2024

I've been analyzing why the /geolocation/lat_1, /geolocation/lon_1 and /geolocation/latitude_longitude_lat_1_lon_1 are being produced. The science variables in the /geolocation group are not currently getting the MetadataOverride for their coordinates, so in resample_variable they have a unique coordinates_key of ('/geolocation/latitude', '/geolocation/longitude', '/geolocation/time') while the product/support group science variables have a coordinates key of ('/geolocation/latitude', '/geolocation/longitude'). The result is a separate target_area object being created for the geolocation variables, and two target_area objects in the reprojection_cache dictionary.

The downstream effect is in nc_single_band.write_dimensions, there is a "cache miss" on the dimensions for the ('/geolocation/latitude', '/geolocation/longitude', '/geolocation/time') coordinates_key, causing the extra lat/lon/grid-mapping (_1) dimensions to be created.

Changing the coordinates MetadataOverride Variable Pattern to the following appears to fix this:
"VariablePattern": "^/product/.*|^/support_data/.*|^/geolocation/(?!time$).*"
This ignores the /geolocation/time variable so it doesn't get recognized as a science variable and applies the override to the rest of the /geolocation science variables.

@owenlittlejohns
Copy link
Member

Ahhhh! That's what it is with the two slightly different coordinates values! Nice work!

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

The only change I'd suggest is possibly raising an exception when the number of rows per scan is less than 2 (because I think we know that will fail) but, because an exception gets raised elsewhere, that's not a deal breaker for me. It's mainly just making sure that whatever does get raised is informative to an end-user.

I guess the next step is to re-run the Swath Projector regression test notebook and update the reference files there (after checking the new versions look reasonable and make sense).

found, return the total number of rows.
"""
if total_rows < 2:
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question I have here is whether returning 1 is the right thing to do when the number of rows per scan is less than 2. Won't the EWA algorithm fail if we ask it to use less than 2 rows? If so, perhaps a better approach is to raise an exception here saying there aren't enough rows. The block of code that calls pyresample.ewa.fornav will raise an exception already, just potentially a less informative one.

(I'm not going to hold the PR up over this as there's already an exception being raised later on, but felt it was worth thinking a bit about - my thinking is that if we know a return value of < 2 is problematic, then we should catch the problem as soon as we know about it)

np.testing.assert_array_equal(result.mask, expected_output.mask)


class TestGetRowsPerScan(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests - I think these cover all the cases.

@joeyschultz joeyschultz merged commit 5c97edb into main Oct 30, 2024
4 checks passed
@joeyschultz joeyschultz deleted the mhs/DAS-2216/quick-fixes branch October 30, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants