Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAFNI Workflow Enhancements #299

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

DAFNI Workflow Enhancements #299

wants to merge 14 commits into from

Conversation

f-allian
Copy link
Contributor

Enhancements to the DAFNI workflow that enables better usability and reproducibility. These include:

  • Removed variables.json as an argument. The variables can now be stored as metadata attributes in the dot file produced by the user.
  • Developing a local test suite to ensure the entrypoint works as expected.
  • Turned main_dafni.py into a module, which improves the modularity and testability of the entrypoint in the test suite.
  • CI/CD workflows have also been updated.

@f-allian f-allian added the enhancement New feature or request label Dec 11, 2024
@f-allian f-allian self-assigned this Dec 11, 2024
@f-allian f-allian marked this pull request as ready for review December 11, 2024 11:30
Copy link

github-actions bot commented Dec 11, 2024

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 36 0 0.96s
✅ PYTHON pylint 36 0 5.63s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.01%. Comparing base (18d0356) to head (05a85fd).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #299   +/-   ##
=======================================
  Coverage   97.01%   97.01%           
=======================================
  Files          29       29           
  Lines        1842     1842           
=======================================
  Hits         1787     1787           
  Misses         55       55           

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb2527b...05a85fd. Read the comment docs.

dafni/README.md Outdated
@@ -10,8 +10,8 @@ to upload the framework onto [DAFNI](https://www.dafni.ac.uk).
- `data` contains two sub-folders (the structure is important for DAFNI).
- `inputs` is a folder that contains the input files that are (separately) uploaded to DAFNI.
- `causal_tests.json` is a JSON file that contains the causal tests.
- `variables.json` is a JSON file that contains the variables and constraints to be used.
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file.
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to document the metadata attributes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmafoster1 I was thinking of documenting this on the DAFNI platform itself, rather than in this README. Both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both is good. I've been using the main_dafni.py as a main entrypoint for other things to save me having to write a separate run script each time, so it makes sense to document it for users who might not have access to the DAFNI platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmafoster1 Happy to do both, however, this raises a separate issue that needs resolving in the future. The /dafni directory, including the entrypoint, should technically only be used by DAFNI users. We wouldn't want to encourage CTF users to use main_dafni.py as the entrypoint to the CTF (as there are obvious limitations to this script). But I'll put a pin in this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, would it make sense to have the dafni stuff be a separate repo? We may have had this discussion before, but if we only want the dafni stuff used on that platform, it probably doesn't make sense to distribute it with the CTF. Alternatively, a longer term goal could be to broaden the applicability of the entrypoint, but that doesn't invalidate this PR, so we can do that separately later.

dafni/README.md Outdated
- `variables.json` is a JSON file that contains the variables and constraints to be used.
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file.
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file,
including the variables (and constraints if necessary) as node attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are constraints expressed as node attributes? I thought constraints came in the causal tests JSON file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmafoster1 I didn't realise constraints are defined in the causal tests file. In that case, we'll need something separate to parse the JSON file for the constraints.

Do we have an example JSON file that I can use that uses constraints in the causal tests file? I can only find one in the Poisson example that defines constraints in the script but doesn't actually use it in the JSON file for some reason, which doesn't make sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see #299 (comment)

The constraints in the script are an unused variable, so I guess they weren't transfered to the JSON file. We probably put them in at some point because they were mandatory and then made them optional but didn't bother to get rid of the variable. This stuff was written a LONG time ago before we had linting for PRs, so I guess it got missed.


return inputs
for (node, attributes), input_var in zip(causal_dag.graph.nodes(data=True), inputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressing constraints in terms of individual variables is a real headache. I battled with this for about 6 months during my PhD. The problem is that constraints can relate variables, e.g. i1 > i2. Then you need to put the constraint i1 > i2 for node i1 and the inverse (i.e. i2 < i1) for node i2. (I guess you don't technically need to do this here, but then you have the question of which node you associate the constraint with.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did the CARLA case study I encoded queries on a per-test basis using an attribute called "query", which is a silly name for it here but made sense at the time because you can then just do df = df.query(test["query"]) and it'll "just work" if the user supplies the query in the correct format. I propose we do a similar thing here.


raise FileNotFoundError
inputs = [
Copy link
Contributor

@jmafoster1 jmafoster1 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we just return the inputs and outputs as separate lists and then join them up later, would it make sense to just return one list of variables defined as

variables = [
        Input(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == input else
        Output(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == output
        for node, attributes in causal_dag.graph.nodes(data=True)
]

It's not really a problem either way, but it's just a bit more elegant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmafoster1 I think it's less clear here what's being calculated and returned as a single line. You'd then have to define inputs = variables[0] which is less informative than the current method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that we never use inputs or outputs in isolation. We build them separately and return them separately, but it looks like the only thing we do with them is append them anyway, so I was wondering whether it made sense to treat them separately at all, but if you think it's more elegant the way you have it then we can leave it as is. It's not like people will be dealing with millions of inputs/outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants