Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Refactor tests that checks the string output from the info module #3564

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Oct 30, 2024

Instead of check the string output from info, I feel checking the numpy array is more accurate and more readable.

This is just a proof of concept. I'll work on other tests if you agree with the changes.

Please vote by 👍 or 👎 or leave comments.

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Oct 30, 2024
Comment on lines 28 to +34
with clib.Session() as lib:
with lib.virtualfile_from_matrix(data) as vfile:
with GMTTempFile() as outfile:
lib.call_module("info", [vfile, f"->{outfile.name}"])
output = outfile.read(keep_tabs=True)
bounds = "\t".join([f"<{col.min():.0f}/{col.max():.0f}>" for col in data.T])
expected = f"<matrix memory>: N = {shape[0]}\t{bounds}\n"
assert output == expected
lib.call_module("info", [vfile, "-C", f"->{outfile.name}"])
output = outfile.loadtxt()
npt.assert_equal(output[::2], data.min(axis=0))
npt.assert_equal(output[1::2], data.max(axis=0))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can change it to:

        with clib.Session() as lib:
            with (
                lib.virtualfile_from_matrix(data) as vintbl,
                lib.virtualfile_out(kind="dataset") as vouttbl,
            ):
                lib.call_module("read", [vintbl, vouttbl, "-Td"])
                output = lib.virtualfile_to_dataset(vfname=vouttbl, output_type="numpy")
                npt.assert_equal(output, data)

Pros:

  • No need to use GMTTempFile
  • Check the full data array rather than just the min/max values
  • This is exactly what we are doing (virtualfile_in/virtualfile_out/call_module/virtualfile_to_dataset) when wrapping modules

Cons:

  • Calls four Session functions in a single test, violating the rule that "unit test should test one thing"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems @weiji14 is against the changes in this comment. Are you OK with the changes in https://github.com/GenericMappingTools/pygmt/pull/3564/files, i.e., checking the numerical min/max values rather than checking the string output from info.

Copy link
Member Author

@seisman seisman Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comparison, we sometimes use Session.write_data to write the dataset into a temporary file and load it to compare the full data array.

with GMTTempFile() as tmp_file:
lib.write_data(
"GMT_IS_VECTOR",
"GMT_IS_POINT",
"GMT_WRITE_SET",
wesn,
tmp_file.name,
dataset,
)
# Load the data and check that it's correct
newx, newy, newz = tmp_file.loadtxt(unpack=True, dtype=dtype)

Copy link
Member

@weiji14 weiji14 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing the <matrix memory>: N part which checks that the array was loaded as a GMT matrix, rather than via a GMT vector <vector memory>. Also missing the count/length of the array.

I think it's best to keep this unit test simple, we're also using .loadtxt here instead of .read, which means it is going through numpy.loadtxt that introduces another layer or source of potential breakages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing the <matrix memory>: N part which checks that the array was loaded as a GMT matrix, rather than via a GMT vector <vector memory>.

I actually don't care if it's a matrix or vector memory (it's just decided by the GMT_VIA_VECTOR/GMT_VIA_MATRIX modifier, as long as the data array is correct.

The current way in the main branch (checking string output from info) only checks the min/max values in each column, which is not 100% correct, since the actual data array that GMT API gets may differ from the array we pass without affecting the min/max values. That's why checking the full data array is more robust. Please see the Session.write_data way above.

@seisman seisman removed the needs review This PR has higher priority and needs review. label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants