DAFNI Workflow Enhancements #299

f-allian · 2024-12-11T11:30:11Z

Enhancements to the DAFNI workflow that enables better usability and reproducibility. These include:

Removed variables.json as an argument. The variables can now be stored as metadata attributes in the dot file produced by the user.
Developing a local test suite to ensure the entrypoint works as expected.
Turned main_dafni.py into a module, which improves the modularity and testability of the entrypoint in the test suite.
CI/CD workflows have also been updated.

github-actions · 2024-12-11T11:31:12Z

🦙 MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Fixed	Errors	Elapsed time
✅ PYTHON	black	36		0	0.96s
✅ PYTHON	pylint	36		0	5.63s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by

codecov · 2024-12-11T11:32:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.01%. Comparing base (18d0356) to head (05a85fd).
Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #299   +/-   ##
=======================================
  Coverage   97.01%   97.01%           
=======================================
  Files          29       29           
  Lines        1842     1842           
=======================================
  Hits         1787     1787           
  Misses         55       55

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb2527b...05a85fd. Read the comment docs.

jmafoster1 · 2024-12-11T11:35:05Z

dafni/README.md

@@ -10,8 +10,8 @@ to upload the framework onto [DAFNI](https://www.dafni.ac.uk).
 - `data` contains two sub-folders (the structure is important for DAFNI).
  - `inputs` is a folder that contains the input files that are (separately) uploaded to DAFNI.
    - `causal_tests.json` is a JSON file that contains the causal tests.
-    - `variables.json` is a JSON file that contains the variables and constraints to be used.
-    - `dag.dot` is a dot file that contains the directed acyclc graph (dag) file.
+    - `dag.dot` is a dot file that contains the directed acyclc graph (dag) file, 


Would it make sense to document the metadata attributes here?

@jmafoster1 I was thinking of documenting this on the DAFNI platform itself, rather than in this README. Both?

I think both is good. I've been using the main_dafni.py as a main entrypoint for other things to save me having to write a separate run script each time, so it makes sense to document it for users who might not have access to the DAFNI platform.

@jmafoster1 Happy to do both, however, this raises a separate issue that needs resolving in the future. The /dafni directory, including the entrypoint, should technically only be used by DAFNI users. We wouldn't want to encourage CTF users to use main_dafni.py as the entrypoint to the CTF (as there are obvious limitations to this script). But I'll put a pin in this for now.

In that case, would it make sense to have the dafni stuff be a separate repo? We may have had this discussion before, but if we only want the dafni stuff used on that platform, it probably doesn't make sense to distribute it with the CTF. Alternatively, a longer term goal could be to broaden the applicability of the entrypoint, but that doesn't invalidate this PR, so we can do that separately later.

jmafoster1 · 2024-12-11T11:35:57Z

dafni/README.md

-    - `variables.json` is a JSON file that contains the variables and constraints to be used.
-    - `dag.dot` is a dot file that contains the directed acyclc graph (dag) file.
+    - `dag.dot` is a dot file that contains the directed acyclc graph (dag) file, 
+       including the variables (and constraints if necessary) as node attributes.


How are constraints expressed as node attributes? I thought constraints came in the causal tests JSON file?

@jmafoster1 I didn't realise constraints are defined in the causal tests file. In that case, we'll need something separate to parse the JSON file for the constraints.

Do we have an example JSON file that I can use that uses constraints in the causal tests file? I can only find one in the Poisson example that defines constraints in the script but doesn't actually use it in the JSON file for some reason, which doesn't make sense to me.

Yes, see #299 (comment)

The constraints in the script are an unused variable, so I guess they weren't transfered to the JSON file. We probably put them in at some point because they were mandatory and then made them optional but didn't bother to get rid of the variable. This stuff was written a LONG time ago before we had linting for PRs, so I guess it got missed.

dafni/data/outputs/causal_tests_results.json

jmafoster1 · 2024-12-11T11:42:17Z

dafni/src/main_dafni.py


-    return inputs
+    for (node, attributes), input_var in zip(causal_dag.graph.nodes(data=True), inputs):


Expressing constraints in terms of individual variables is a real headache. I battled with this for about 6 months during my PhD. The problem is that constraints can relate variables, e.g. i1 > i2. Then you need to put the constraint i1 > i2 for node i1 and the inverse (i.e. i2 < i1) for node i2. (I guess you don't technically need to do this here, but then you have the question of which node you associate the constraint with.)

When I did the CARLA case study I encoded queries on a per-test basis using an attribute called "query", which is a silly name for it here but made sense at the time because you can then just do df = df.query(test["query"]) and it'll "just work" if the user supplies the query in the correct format. I propose we do a similar thing here.

jmafoster1 · 2024-12-11T11:51:35Z

dafni/src/main_dafni.py


-        raise FileNotFoundError
+    inputs = [


Since we just return the inputs and outputs as separate lists and then join them up later, would it make sense to just return one list of variables defined as

variables = [ Input(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == input else Output(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == output for node, attributes in causal_dag.graph.nodes(data=True) ]

It's not really a problem either way, but it's just a bit more elegant

@jmafoster1 I think it's less clear here what's being calculated and returned as a single line. You'd then have to define inputs = variables[0] which is less informative than the current method.

My point was that we never use inputs or outputs in isolation. We build them separately and return them separately, but it looks like the only thing we do with them is append them anyway, so I was wondering whether it made sense to treat them separately at all, but if you think it's more elegant the way you have it then we can leave it as is. It's not like people will be dealing with millions of inputs/outputs.

f-allian added 13 commits December 6, 2024 12:28

remove variables from entry-point

5dc3d27

update sample dag file to include the variables as node attributes

b929564

update Dockerfile for dafni image

354fd3e

update dafni model definition

0e53b47

update README.md

4497da2

turned main_dafni.py into a module

fb83d65

final touches to main_dafni.py

5b736b8

add: tests for main_dafni.py

ea46bdc

update: tests to dafni workflow

005d2da

fix: typo in Dockerfile

9a45392

fix: default in ignore_cycles tests_main_dafni.py

456048c

updated causal_tests_results.json

029fdfc

update: file names

802e274

f-allian added the enhancement New feature or request label Dec 11, 2024

f-allian requested review from jmafoster1 and rsomers1998 December 11, 2024 11:30

f-allian self-assigned this Dec 11, 2024

f-allian marked this pull request as ready for review December 11, 2024 11:30

jmafoster1 reviewed Dec 11, 2024

View reviewed changes

f-allian commented Dec 11, 2024

View reviewed changes

dafni/data/outputs/causal_tests_results.json Show resolved Hide resolved

jmafoster1 reviewed Dec 11, 2024

View reviewed changes

update: README.md metadata

05a85fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAFNI Workflow Enhancements #299

DAFNI Workflow Enhancements #299

f-allian commented Dec 11, 2024

github-actions bot commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

jmafoster1 Dec 11, 2024

f-allian Dec 11, 2024

jmafoster1 Dec 11, 2024

f-allian Dec 13, 2024

jmafoster1 Dec 16, 2024

jmafoster1 Dec 11, 2024

f-allian Dec 11, 2024

jmafoster1 Dec 11, 2024

jmafoster1 Dec 11, 2024

jmafoster1 Dec 11, 2024

jmafoster1 Dec 11, 2024 •

edited

Loading

f-allian Dec 13, 2024

jmafoster1 Dec 16, 2024


		return inputs
		for (node, attributes), input_var in zip(causal_dag.graph.nodes(data=True), inputs):

DAFNI Workflow Enhancements #299

Are you sure you want to change the base?

DAFNI Workflow Enhancements #299

Conversation

f-allian commented Dec 11, 2024

github-actions bot commented Dec 11, 2024 • edited Loading

🦙 MegaLinter status: ✅ SUCCESS

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmafoster1 Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 11, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

jmafoster1 Dec 11, 2024 •

edited

Loading