Add resources to provenance #3016

benjeffery · 2024-10-09T11:04:47Z

Description

Add optional resources to provenances along with a helper method to populate the field. This is intended for long-running processed such as tsinfer, where measuring the resources used across distributed pipelines is awkward and easier to do in the actual process that makes the tree sequence.

PR Checklist:

Tests that fully cover new/changed functionality.
Documentation including tutorial content if appropriate.
Changelogs, if there are API changes.

benjeffery · 2024-10-09T11:10:45Z

@jeromekelleher I'm not sure how library methods such as simplify that could be run as part of a larger process are going to record their max_mem.

codecov · 2024-10-09T11:16:59Z

Codecov Report

Attention: Patch coverage is 78.57143% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.83%. Comparing base (7320290) to head (06a4132).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
python/tskit/provenance.py	78.57%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3016      +/-   ##
==========================================
- Coverage   89.84%   89.83%   -0.01%     
==========================================
  Files          29       29              
  Lines       32093    32111      +18     
  Branches     6230     5758     -472     
==========================================
+ Hits        28833    28847      +14     
- Misses       1859     1861       +2     
- Partials     1401     1403       +2

Flag	Coverage Δ
c-tests	`86.69% <ø> (ø)`
lwt-tests	`80.78% <ø> (ø)`
python-c-tests	`89.05% <ø> (ø)`
python-tests	`98.95% <71.42%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/provenance.py	`93.75% <78.57%> (-6.25%)`	⬇️

... and 3 files with indirect coverage changes

jeromekelleher

I don't know if we want to get into the business of generating this in tskit, although I agree it would be nice to not have to redo it in libraries.

I guess we could require the start_time as a parameter, which would force users to generate a sensible one?

I think this is something that we don't expect to be recorded much, except in cases where there's a lot of resources involved. Msprime, for example, wouldn't bother with this.

jeromekelleher · 2024-10-09T11:38:49Z

python/tskit/provenance.py

@@ -72,6 +80,31 @@ def get_environment(extra_libs=None, include_tskit=True):
    return env


+def get_resources():


Do we want to do this here? I feel like this will get misused pretty badly as people won't realise that time is relative to when tskit got imported. It's only really appropriate in cases where the time is essentially the full life of the interpreter.

jeromekelleher · 2024-10-09T11:40:30Z

python/tskit/provenance.schema.json

+          "description": "System time used in seconds.",
+          "type": "number"
+        },
+        "max_mem": {


Let's make this max_memory here, I've commited this in sc2ts and the brevity doesn't really help.

jeromekelleher · 2024-10-09T11:40:59Z

python/tskit/util.py

@@ -216,7 +216,8 @@ def pack_arrays(list_of_lists, dtype=np.float64):
    """
    Packs the specified list of numeric lists into a flattened numpy array
    of the specified dtype with corresponding offsets. See
-    :ref:`sec_encoding_ragged_columns` for details of this encoding of columns
+    :ref:`sec_encoding_ragged_columns` for detThis information


Some collatoral damage?

benjeffery · 2024-10-09T11:48:20Z

Yes, it was only when I went to add it to existing tskit methods I realised this was the wrong approach. I don't think adding it to all tskit methods is useful.

I still think the method is useful though, how about I add the start_time argument and leave it undocumented?

jeromekelleher · 2024-10-09T12:16:22Z

Sgtm

benjeffery · 2024-10-09T13:15:48Z

@jeromekelleher Ok! Should be good to go.

jeromekelleher

Spotted a few more things, but LGTM then.

We should get some wider input on this before adding to schema i guess, maybe do a shout out on Slack?

jeromekelleher · 2024-10-09T13:38:31Z

python/tests/test_provenance.py

@@ -35,6 +43,9 @@
 import tskit.provenance as provenance


+_start_time = time.time()


Global value probably doesn't help with testing

jeromekelleher · 2024-10-09T13:39:31Z

python/tests/test_provenance.py

+            assert "max_memory" in resources
+
+    def test_used_resources_values(self):
+        resources = provenance.get_resources(_start_time)


Something less fragile would be time.time() - delta, and then test that elapsed time is >= delta.

jeromekelleher · 2024-10-09T13:40:21Z

python/tests/test_provenance.py

+        assert isinstance(resources["user_time"], float)
+        assert isinstance(resources["sys_time"], float)
+        assert resources["elapsed_time"] > 0.0001
+        assert resources["user_time"] > 0.0001


Just do > 0 here, there will surely be cases there this fails in CI or whatever

jeromekelleher · 2024-10-09T13:42:24Z

python/tskit/provenance.py

+
+try:
+    import resource
+except ImportError:


I think it's just that the module doesn't exist on Win right? Otherwise check wouldn't work.

jeromekelleher · 2024-10-09T13:44:03Z

python/tskit/provenance.py

+        "sys_time": times.system + times.children_system,
+    }
+    if resource is not None:
+        # Don't report max memory on Windows. We could do this using the psutil lib, via


This comment isn't accurate for tskit - I don't think we'd want a dependency on psutil.

jeromekelleher · 2024-10-09T13:44:50Z

python/tskit/provenance.py

+        # psutil.Process(os.getpid()).get_ext_memory_info().peak_wset if demand exists
+        ret["max_memory"] = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+        if sys.platform != "darwin":
+            ret["max_memory"] *= 1024  # Linux, freeBSD et al reports in KB, not bytes


Should be specific it's KiB not KB (10**3)

benjeffery · 2024-10-09T15:36:29Z

Comments addressed in 06a4132

benjeffery force-pushed the resources-provenance branch 2 times, most recently from 5a313f9 to 6798b1a Compare October 9, 2024 11:08

benjeffery force-pushed the resources-provenance branch from 6798b1a to 70fcf6c Compare October 9, 2024 11:11

benjeffery force-pushed the resources-provenance branch from 70fcf6c to 98a3a96 Compare October 9, 2024 11:41

jeromekelleher reviewed Oct 9, 2024

View reviewed changes

benjeffery force-pushed the resources-provenance branch 3 times, most recently from 923b54e to 118e9a0 Compare October 9, 2024 12:44

Add resources to provenance

19436e3

benjeffery force-pushed the resources-provenance branch from 118e9a0 to 19436e3 Compare October 9, 2024 12:51

jeromekelleher approved these changes Oct 9, 2024

View reviewed changes

Address comments

06a4132

bhaller mentioned this pull request Oct 9, 2024

add support for saving computational resources used to .trees files MesserLab/SLiM#478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resources to provenance #3016

Add resources to provenance #3016

benjeffery commented Oct 9, 2024 •

edited

Loading

benjeffery commented Oct 9, 2024

codecov bot commented Oct 9, 2024 •

edited

Loading

jeromekelleher left a comment

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

benjeffery commented Oct 9, 2024

jeromekelleher commented Oct 9, 2024

benjeffery commented Oct 9, 2024

jeromekelleher left a comment

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

jeromekelleher Oct 9, 2024

benjeffery commented Oct 9, 2024

		@@ -72,6 +80,31 @@ def get_environment(extra_libs=None, include_tskit=True):
		return env


		def get_resources():

		@@ -35,6 +43,9 @@
		import tskit.provenance as provenance


		_start_time = time.time()

Add resources to provenance #3016

Are you sure you want to change the base?

Add resources to provenance #3016

Conversation

benjeffery commented Oct 9, 2024 • edited Loading

Description

PR Checklist:

benjeffery commented Oct 9, 2024

codecov bot commented Oct 9, 2024 • edited Loading

Codecov Report

jeromekelleher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjeffery commented Oct 9, 2024

jeromekelleher commented Oct 9, 2024

benjeffery commented Oct 9, 2024

jeromekelleher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjeffery commented Oct 9, 2024

benjeffery commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading