-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAFNI Workflow Enhancements #299
base: main
Are you sure you want to change the base?
Conversation
🦙 MegaLinter status: ✅ SUCCESS
See detailed report in MegaLinter reports |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #299 +/- ##
=======================================
Coverage 97.01% 97.01%
=======================================
Files 29 29
Lines 1842 1842
=======================================
Hits 1787 1787
Misses 55 55 Continue to review full report in Codecov by Sentry.
|
dafni/README.md
Outdated
@@ -10,8 +10,8 @@ to upload the framework onto [DAFNI](https://www.dafni.ac.uk). | |||
- `data` contains two sub-folders (the structure is important for DAFNI). | |||
- `inputs` is a folder that contains the input files that are (separately) uploaded to DAFNI. | |||
- `causal_tests.json` is a JSON file that contains the causal tests. | |||
- `variables.json` is a JSON file that contains the variables and constraints to be used. | |||
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file. | |||
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to document the metadata attributes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmafoster1 I was thinking of documenting this on the DAFNI platform itself, rather than in this README. Both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both is good. I've been using the main_dafni.py
as a main entrypoint for other things to save me having to write a separate run script each time, so it makes sense to document it for users who might not have access to the DAFNI platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmafoster1 Happy to do both, however, this raises a separate issue that needs resolving in the future. The /dafni
directory, including the entrypoint, should technically only be used by DAFNI users. We wouldn't want to encourage CTF users to use main_dafni.py
as the entrypoint to the CTF (as there are obvious limitations to this script). But I'll put a pin in this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, would it make sense to have the dafni stuff be a separate repo? We may have had this discussion before, but if we only want the dafni stuff used on that platform, it probably doesn't make sense to distribute it with the CTF. Alternatively, a longer term goal could be to broaden the applicability of the entrypoint, but that doesn't invalidate this PR, so we can do that separately later.
dafni/README.md
Outdated
- `variables.json` is a JSON file that contains the variables and constraints to be used. | ||
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file. | ||
- `dag.dot` is a dot file that contains the directed acyclc graph (dag) file, | ||
including the variables (and constraints if necessary) as node attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are constraints expressed as node attributes? I thought constraints came in the causal tests JSON file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmafoster1 I didn't realise constraints
are defined in the causal tests file. In that case, we'll need something separate to parse the JSON file for the constraints.
Do we have an example JSON file that I can use that uses constraints in the causal tests file? I can only find one in the Poisson example that defines constraints in the script but doesn't actually use it in the JSON file for some reason, which doesn't make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see #299 (comment)
The constraints in the script are an unused variable, so I guess they weren't transfered to the JSON file. We probably put them in at some point because they were mandatory and then made them optional but didn't bother to get rid of the variable. This stuff was written a LONG time ago before we had linting for PRs, so I guess it got missed.
|
||
return inputs | ||
for (node, attributes), input_var in zip(causal_dag.graph.nodes(data=True), inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressing constraints in terms of individual variables is a real headache. I battled with this for about 6 months during my PhD. The problem is that constraints can relate variables, e.g. i1 > i2
. Then you need to put the constraint i1 > i2
for node i1
and the inverse (i.e. i2 < i1
) for node i2
. (I guess you don't technically need to do this here, but then you have the question of which node you associate the constraint with.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I did the CARLA case study I encoded queries on a per-test basis using an attribute called "query", which is a silly name for it here but made sense at the time because you can then just do df = df.query(test["query"])
and it'll "just work" if the user supplies the query in the correct format. I propose we do a similar thing here.
|
||
raise FileNotFoundError | ||
inputs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we just return the inputs and outputs as separate lists and then join them up later, would it make sense to just return one list of variables
defined as
variables = [
Input(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == input else
Output(node, eval(eval(attributes["datatype"]))) if attributes["typestring"] == output
for node, attributes in causal_dag.graph.nodes(data=True)
]
It's not really a problem either way, but it's just a bit more elegant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmafoster1 I think it's less clear here what's being calculated and returned as a single line. You'd then have to define inputs = variables[0]
which is less informative than the current method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that we never use inputs
or outputs
in isolation. We build them separately and return them separately, but it looks like the only thing we do with them is append them anyway, so I was wondering whether it made sense to treat them separately at all, but if you think it's more elegant the way you have it then we can leave it as is. It's not like people will be dealing with millions of inputs/outputs.
Enhancements to the DAFNI workflow that enables better usability and reproducibility. These include:
variables.json
as an argument. The variables can now be stored as metadata attributes in thedot
file produced by the user.main_dafni.py
into a module, which improves the modularity and testability of the entrypoint in the test suite.