Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dynamic <<>> runindex #213

Merged
merged 22 commits into from
Jan 22, 2024

Conversation

dzemanov
Copy link

Adding run-1 to files immediately when new run is detected introduces problems:

  1. dcm2niix2bids: variable jsonfiles
    Need to remove run-less file and add run-1 file

  2. dcm2niix2bids: newbidsname via dcm2niix postfixes

  • If bidsname changes via postfixes, run-index starts from the one that was already in old bidsname, although this new bidsname could be unique and therefore doesn’t need run-index. Function increment_runindex would need to check for that.
  • If bidsname changes from run-less to run-2 via dcm2niix postfix in the same dcm2niix plugin run, again, need to remove run-less file and add run-1 file to jsonfiles. If it is the other way around, so from run-2 to run-less (after increment_runindex is adjusted for the case if bidsname contains run-index although it doesn’t need it), previously added run-1 with increment_runindex needs to be removed because it is no longer valid. This needs to be reflected in scans_table and in jsonfiles again.

I think this solution is not well maintainable since one needs to rely on the concrete order of functions called and increment_runindex is not standing on its own, since we need to later fix renamed files. With dcm2niix postfixes, files are being renamed back and forth, when run-1 is not needed in the first place, although we do not know it at the time of naming. Also, if in the future files are collected in new variables, these changes need to be reflected in all the variables and it is easy to forget to do that.

I suggest adding run-1 to files that need it (run-2 exists) and use <<>> run-index at the end of the plugin run, since at that time all the names of files are final and dcm2niix postfixes won’t change them. It also removes side effect from increment_runindex and makes it simpler.

I merged this fix with other suggestion I have: generate .tsv file with information what source was mapped to what bids files. I know that one can see in logs what was mapped to what, but having tsv file means one can quickly check the final names and that mappings are correct. We can see how the result mappings would look like also from bidsmap, but run-indexes and changes via dcm2niix postfixes are not there.
Maybe in the future, user could define what run data would he like to add as new column for easier check of mappings, for now I added SeriesDescription since a lot of files are mapped by that.

This bids_mappings.tsv file looks for example like this:

subject SeriesDescription source BIDS_mapping
sub-01 AAHead_Scout sub-01/std1_01_AAHead_Scout X
sub-01 localizer sub-01/std1_02_localizer X
sub-01 t1_mprage_sag sub-01/std1_03_t1_mprage_sag anat/sub-01_T1w.nii.gz
sub-01 bold_25iso_tr800-rest sub-01/std1_04_bold_25iso_tr800-rest func/sub-01_task-bold25isotr800rest_echo-1_bold.nii.gz
sub-01 bold_25iso_tr800-rest sub-01/std1_04_bold_25iso_tr800-rest func/sub-01_task-bold25isotr800rest_echo-2_bold.nii.gz
sub-01 bold_25iso_tr800-rest sub-01/std1_05_bold_25iso_tr800-rest func/sub-01_task-bold25isotr800rest_echo-1_part-phase_bold
sub-01 bold_25iso_tr800-rest sub-01/std1_05_bold_25iso_tr800-rest func/sub-01_task-bold25isotr800rest_echo-2_part-phase_bold
sub-01 bold_gre_map sub-01/std1_06_bold_gre_map fmap/sub-01_magnitude1.nii.gz
sub-01 bold_gre_map sub-01/std1_06_bold_gre_map fmap/sub-01_magnitude2.nii.gz
sub-01 bold_gre_map sub-01/std1_07_bold_gre_map fmap/sub-01_phasediff.nii.gz

@marcelzwiers
Copy link
Collaborator

I suggest adding run-1 to files that need it (run-2 exists) and use <<>> run-index at the end of the plugin run, since at that time all the names of files are final and dcm2niix postfixes won’t change them. It also removes side effect from increment_runindex and makes it simpler.

I agree that this is probably less complicated and a good idea

I merged this fix with other suggestion I have: generate .tsv file with information what source was mapped to what bids files

I need to think about this a bit more, get a better understanding of the problem you are solving with the tsv file

@dzemanov
Copy link
Author

Thank you very much for looking into this.

Bids mappings are used here as a way of storing run indexes so we know at the end of bidscoiner plugin run which runs use <<>> dynamic runindex – those are the runs that potentially need to have _run-1 appended. Maybe there is a better way to do this.

A TSV file containing BIDS mappings can be highly beneficial in validating the results produced by BIDScoiner. This file enables tracking of which original data inputs lead to specific final outputs (including runs, dcm2niix postfixes...).

@marcelzwiers
Copy link
Collaborator

marcelzwiers commented Jan 15, 2024

Could you perhaps split off the tsv part? I also think it's probably better to fix the run-index at the end, after all the files are there.

@marcelzwiers
Copy link
Collaborator

Let's deal with the tsv option later, and first fix the more important run-indexing

@@ -56,6 +56,12 @@ def bidscoiner(rawfolder: str, bidsfolder: str, subjects: list=(), force: bool=F
# Create a code/bidscoin subfolder
(bidsfolder/'code'/'bidscoin').mkdir(parents=True, exist_ok=True)

# Delete bids_mappings file if it exists
bids_mappings_file = bidsfolder / 'code' / 'bidscoin' / 'bids_mappings.tsv'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to go wrong when you run bidscoiner multiple times (e.g. run it every time a new subject is acquired)

bidscoin/bids.py Outdated
@@ -313,6 +313,32 @@ def dynamicvalue(self, value, cleanup: bool=True, runtime: bool=False):
return value


class BidsMapping:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a new class is needed, or whether this could simply be part of DataSource (that contains almost the same info, except run and targets)

@@ -226,6 +227,7 @@ def bidscoiner_plugin(session: Path, bidsmap: dict, bidsses: Path) -> Union[None
scans_table.index.name = 'filename'

# Process all the source files or run subfolders
bids_mappings: List[BidsMapping] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are doing but I generally try to keep the plugins as short and simple as possible (to encourage external contributors to write plugiuns), and move stuff over as much as possible to bidscoiner. Building a list of mappings does add to the complexity...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand :). BidsMappings can be deleted from the code.
Commits "Make increment_runindex to not retroactively add run1, delete run from bidsname if not needed with dynamic index" and the principle of method bids.rename_runless_to_run1 introduced in commit "Add rename_runles_to_run1" can be used.
In order to dynamically add run-1 to run-less files at the end, we could:

  • save information which files were generated with <<>> setting, and for those run add_rename_runless_to_run1
    Each plugin would need to run at the end of conversion bids.rename_runless_to_run1, which is still easier than BidsMappings.
  • Run rename_runless_to_run1 in bidscoiner after plugin. Files that need run-1 could be detected via the existence of run-2 and no run-1. Hmm, that defeats the purpose of fixed run-index. Then maybe we could get <<>> information from bidsmap, but we do not extract from it which files were conversion output of that entry, and matching them again is not a great option.

I will have time to look at it more closely tomorrow evening.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the bidsmapping tsv file you were basically making a provenance table. For tracking provenance there is the W3C standard, and there is even a BEP in the making to implement this in BIDS:

https://github.com/bids-standard/BEP028_BIDSprov

That adds even more complexity, but would come with the benefit of being a standard...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was to quickly check after the bidscoiner run what was done - what source was converted to what BIDS.
Thank you for pointing me to the standards; it looks really useful.

bidscoin/bids.py Outdated
new_entry = {
"subject": target_subject,
"session": target_session,
'SeriesDescription': bids_mapping.run.get("attributes", {}).get("SeriesDescription"),
Copy link
Collaborator

@marcelzwiers marcelzwiers Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only for DICOM, not for other modalities / plugins

@@ -312,6 +314,7 @@ def bidscoiner_plugin(session: Path, bidsmap: dict, bidsses: Path) -> Union[None
if not list(outfolder.glob(f"{bidsname}.*nii*")): continue

jsonfiles.update(outfolder.glob(f"{bidsname}.json")) # add existing created json files: bidsname.json
run["targets"].update(outfolder.glob(f"{bidsname}.*[!json]")) # add files created using this bidsmap run-item (except sidecars)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is all that is need when targets are added to the datasource

bidscoin/bids.py Outdated
Comment on lines 1516 to 1518
else:
run = copy.copy(run) # popping targets will not change original run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit like a hack, I suspect it is not needed if we add targets to the datasource instead of to the run. The datasource as part of the run is already hacky, but at least then it is all in one place. Moreover, cleaning up the datasource from the run is already implemented (e.g. when saving the bidsmap). I will make some commits to your PR so you can see what I mean

@marcelzwiers
Copy link
Collaborator

I moved run['targets'] to run['datasource'].targets. That may seem like a silly thing to do but run['datasource'] was already a bit of an ugly hack (putting an object in a data structure) and run['targets'] would have been another one. I think the code is simpler now and also more robust (run['datasource'] is e.g. cleaned up when saving a bidsmap, but your run['targets'] wasn't)

@marcelzwiers
Copy link
Collaborator

It looks fine to me now, what do you think, are you happy with my changes?

@marcelzwiers
Copy link
Collaborator

Your provenance bidsmappings tsv file can now be easily generated in the bids library (most easily in rename_runless_to_run1(), although that is a bit hacky). But let's open a new issue (or PR) for this, because maybe this can be done in a BEP028_BIDSprov compatible way.

@dzemanov
Copy link
Author

The code looks really nice.

The problem is that while datasource.targets are being updated, datasource under run is not updated, because it is a different datasource instance, created in bids.get_run_. So run["datasource"].targets stay always empty.

The quick solution would be to instead of datasource.targets.update use run["datasource"].targets.update.
This means run["datasource"] and datasource will be different and it can be confusing, so I would maybe suggest making sure in bids.get_matching_run that run will contain datasource that is passed into that function or add to bids.get_run_ optional param datasource, that will be used instead of creating new instance.

@marcelzwiers
Copy link
Collaborator

The problem is that while datasource.targets are being updated, datasource under run is not updated, because it is a different datasource instance, created in bids.get_run_

Is it? I didn't test it in practice but I did checked the logic for this in the code. The datasource instance is passed all the way up to get_run_?

@dzemanov
Copy link
Author

Plugin dcm2niix calls get_matching_run to which it passes datasource.
Function get_matching_run returns run_, which is created by get_run_ and datasource is not passed to this function, instead, it creates new datasource instance:

        datasource = DataSource(provenance, plugins, dataformat, datatype, subprefix, sesprefix)
   else:
       datasource = DataSource(provenance, dataformat=dataformat, datatype=datatype)

@dzemanov dzemanov closed this Jan 21, 2024
@dzemanov dzemanov reopened this Jan 21, 2024
@marcelzwiers
Copy link
Collaborator

Ok, I'll see if I can fix that in a clean way

@marcelzwiers
Copy link
Collaborator

Adding a DataSource to run has always been a workaround I'm not very happy with. I have now cleaned things up by refactoring getting / creating runs from a data source. It passes the pytests, but that unfortunately doesn't properly cover plugin integration tests. Does it work for you now?

@dzemanov
Copy link
Author

Yes, it works! :)

@marcelzwiers marcelzwiers merged commit c75c852 into Donders-Institute:master Jan 22, 2024
7 checks passed
marcelzwiers added a commit that referenced this pull request Feb 2, 2024
- Fix broken suffix handling
- Remove the illogical logic for obsolete(?) run-2 names
- Adjust the tests accordingly
marcelzwiers added a commit that referenced this pull request Feb 2, 2024
- Revert renaming at the end, instead rename when encountering run-2
- Remove the datasource.targets bookkeeping
- Adjust the tests accordingly
@marcelzwiers
Copy link
Collaborator

I want to release a new bidscoin version and started to do more integration tests. I then came across several issues related to this PR (e.g. broken suffix handling, run-index orphans, etc) and started looking deeper into your arguments for it. I decided to revert most of this PR for the following reasons:

  1. Renaming runless can be done when dealing with run-2. Waiting to the end doesn't change anything to the available information, nor the actions that have to be done. In other words, postponing it has no benefit at all.
  2. It introduced extra complexity by having to do bookkeeping of the targets. In fact, that bookkeeping was redundant, as it was already done in the jsonfiles set (now renamed to sidecars and better commented to better express this intent). Moreover, dcm2niix2bids is the only plugin (thus far at least) that produces such unpredictable output, so pushing that bookkeeping as a generic mechanism to other plugins was, in hindsight, not a good idea.

Adding run-1 to files immediately when new run is detected introduces problems:

  1. dcm2niix2bids: variable jsonfiles
    Need to remove run-less file and add run-1 file

I fail to see how that would change if you wait until the end

  1. dcm2niix2bids: newbidsname via dcm2niix postfixes
  • If bidsname changes via postfixes, run-index starts from the one that was already in old bidsname, although this new bidsname could be unique and therefore doesn’t need run-index.

That's an excellent observation of a subtle bug! I now fixed that by simply resetting the run-index of newbidsname

Function increment_runindex would need to check for that.

No, not with the above fix

  • If bidsname changes from run-less to run-2 via dcm2niix postfix in the same dcm2niix plugin run, again, need to remove run-less file and add run-1 file to jsonfiles. If it is the other way around, so from run-2 to run-less (after increment_runindex is adjusted for the case if bidsname contains run-index although it doesn’t need it), previously added run-1 with increment_runindex needs to be removed because it is no longer valid. This needs to be reflected in scans_table and in jsonfiles again.

That's a very complicated sentence, and I'm not sure if I follow it. But I think whatever you trying to say here, it is no longer relevant as the run-index of newbidsname now starts from scratch

I think this solution is not well maintainable since one needs to rely on the concrete order of functions called and increment_runindex is not standing on its own, since we need to later fix renamed files. With dcm2niix postfixes, files are being renamed back and forth, when run-1 is not needed in the first place, although we do not know it at the time of naming. Also, if in the future files are collected in new variables, these changes need to be reflected in all the variables and it is easy to forget to do that.

Again, I don't see how doing more bookkeeping of outputs makes things easier. I think it is the other way around, by reverting the PR (well not all of it, I kept the good parts :-)), the code became simpler, with less overhead and better to maintain. Unfortunately, the problem of dealing with dcm2niix output is just very hard, as you have experienced, so renaming schemes remain complex but inevitable.

I suggest adding run-1 to files that need it (run-2 exists) and use <<>> run-index at the end of the plugin run, since at that time all the names of files are final and dcm2niix postfixes won’t change them. It also removes side effect from increment_runindex and makes it simpler.

No, as soon as newbidsname is ready, nothing changes anymore and the run-indexing can done immediately. It is the other way around, waiting till the end can cause side-effects, because dcm2niix can then start adding a, b, c, etc suffixes to runless filenames, which makes the problem much much worse.

Anyhow, I greatly appreciated this PR and it has definitely led to bugfixes and improved design. I still need to do more testing and there are some (unrelated) bugs that I still need to fix. I sincerely hope the new version is working for you, let me know if it isn't, and I'll try to identify the cause and fix it!

@dzemanov
Copy link
Author

dzemanov commented Feb 4, 2024

I want to release a new bidscoin version and started to do more integration tests. I then came across several issues related to this PR (e.g. broken suffix handling, run-index orphans, etc) and started looking deeper into your arguments for it. I decided to revert most of this PR for the following reasons:

  1. Renaming runless can be done when dealing with run-2. Waiting to the end doesn't change anything to the available information, nor the actions that have to be done. In other words, postponing it has no benefit at all.
  2. It introduced extra complexity by having to do bookkeeping of the targets. In fact, that bookkeeping was redundant, as it was already done in the jsonfiles set (now renamed to sidecars and better commented to better express this intent). Moreover, dcm2niix2bids is the only plugin (thus far at least) that produces such unpredictable output, so pushing that bookkeeping as a generic mechanism to other plugins was, in hindsight, not a good idea.

Adding run-1 to files immediately when new run is detected introduces problems:

  1. dcm2niix2bids: variable jsonfiles
    Need to remove run-less file and add run-1 file

I fail to see how that would change if you wait until the end

  1. dcm2niix2bids: newbidsname via dcm2niix postfixes
  • If bidsname changes via postfixes, run-index starts from the one that was already in old bidsname, although this new bidsname could be unique and therefore doesn’t need run-index.

That's an excellent observation of a subtle bug! I now fixed that by simply resetting the run-index of newbidsname

Function increment_runindex would need to check for that.

No, not with the above fix

  • If bidsname changes from run-less to run-2 via dcm2niix postfix in the same dcm2niix plugin run, again, need to remove run-less file and add run-1 file to jsonfiles. If it is the other way around, so from run-2 to run-less (after increment_runindex is adjusted for the case if bidsname contains run-index although it doesn’t need it), previously added run-1 with increment_runindex needs to be removed because it is no longer valid. This needs to be reflected in scans_table and in jsonfiles again.

That's a very complicated sentence, and I'm not sure if I follow it. But I think whatever you trying to say here, it is no longer relevant as the run-index of newbidsname now starts from scratch

I think this solution is not well maintainable since one needs to rely on the concrete order of functions called and increment_runindex is not standing on its own, since we need to later fix renamed files. With dcm2niix postfixes, files are being renamed back and forth, when run-1 is not needed in the first place, although we do not know it at the time of naming. Also, if in the future files are collected in new variables, these changes need to be reflected in all the variables and it is easy to forget to do that.

Again, I don't see how doing more bookkeeping of outputs makes things easier. I think it is the other way around, by reverting the PR (well not all of it, I kept the good parts :-)), the code became simpler, with less overhead and better to maintain. Unfortunately, the problem of dealing with dcm2niix output is just very hard, as you have experienced, so renaming schemes remain complex but inevitable.

I suggest adding run-1 to files that need it (run-2 exists) and use <<>> run-index at the end of the plugin run, since at that time all the names of files are final and dcm2niix postfixes won’t change them. It also removes side effect from increment_runindex and makes it simpler.

No, as soon as newbidsname is ready, nothing changes anymore and the run-indexing can done immediately. It is the other way around, waiting till the end can cause side-effects, because dcm2niix can then start adding a, b, c, etc suffixes to runless filenames, which makes the problem much much worse.

Anyhow, I greatly appreciated this PR and it has definitely led to bugfixes and improved design. I still need to do more testing and there are some (unrelated) bugs that I still need to fix. I sincerely hope the new version is working for you, let me know if it isn't, and I'll try to identify the cause and fix it!

Hello,
I understand the desire to keep it in increment runindex and not aggregate runs in a variable. When I first tried to fix the code, this was my first approach as well, so resetting runindex in increment_runindex besides other needed changes. I apologise, I did not explain the bugs very clearly, let me try again. The new code introduces these bugs:

Not updating sidecars when run1 is added via increment_runindex. This means that newly renamed run1 file via increment runindex is not in sidecars, but runless file still is, so run1file is not included in scans file, for runless file there is warning Unexpected conversion result, no output files: {}. Also, run1 file does not include metadata that are defined by user, since it was not updated as it was not in sidecars. This is easy fix, e.g on line 445 in dcm2niix2bids.py:

if run['bids'].get('run') == "<<>>" and bids.get_bidsvalue(newbidsfile, 'run') == '2':
oldjsonrunlessfile = bids.insert_bidskeyval(newbidsfile, 'run', '', False).with_suffix('').with_suffix('.json')
newjsonrun1file = bids.insert_bidskeyval(newbidsfile, 'run', '1', False).with_suffix('').with_suffix('.json')
sidecars.discard(oldjsonrunlessfile)
sidecars.add(newjsonrun1file)

For the other problem, let me give a concrete example:
Suppose we converted source to sub-1_task-A_bold.nii. Then, we go to second source which gets also bidsname sub-1_task-A_bold.nii. This means we will in increment_runindex change the names to sub-1_task-A_run-1_ bold.nii and sub-1_task-A _run-2_ bold.nii. Then, in dcm2niixpostfixes, we will find out that sub-1_task-A_run-2_ bold.nii should actually be sub-1_task-A_part-phase_bold.nii because it has phase postfix. Now increment_runindex returns correctly sub-1_task-A_part-phase_ bold.nii thanks to the reset. But what about already renamed sub-1_task-A_run-1_bold.nii? It now has incorrect run-1 added, although run-2 does not exist now and it needs to be runless. So this means we need to rename it back to runless, update scans and also please do not forget about sidecars, they also need to be updated from run-1 back to runless. Code in increment_runindex for # Rename run-less to run-1 can be used, but just the other way around, so Rename run-1 to run-less. But again, sidecars need to be updated for this change - discard run1 and add runless.
This issue will not arise if for the run1, default bidsname was changed with posfixes (e.g echo) – so increment_runindex does not add retroactively run1 and sub-1_task-A_bold.nii stays runless.

The code you added is really nice. Renaming runless to run1 and back is not great, but will work. I tried to play a bit with bidsprov from the other issue, which in simplified version uses the aggregated variable matched_runs to generate bidsprov, and these changes mean bidsprov will be harder to generate (maybe via custom logger function?). But that is out of question here + I don’t know if other people would see as valuable a quick way to check what outputs were generated from what inputs + BEP028 is only proposed now.

marcelzwiers added a commit that referenced this pull request Feb 4, 2024
marcelzwiers added a commit that referenced this pull request Feb 4, 2024
@marcelzwiers
Copy link
Collaborator

marcelzwiers commented Feb 4, 2024

I just realized that that scans_table doesn't need any bookkeeping / renaming at all, it is sidecars (now further simplified and renamed once more to bidsnames) that needed proper bookkeeping (after all, scans_table is populated from it). I haven't fully tested things but I think this is really the solution :-)

@marcelzwiers
Copy link
Collaborator

One other idea that is still in my mind is to use the acq times to verify (or even correct) the ordering of the run-indexes

@marcelzwiers
Copy link
Collaborator

For the other problem, let me give a concrete example:
Suppose we converted source to sub-1_task-A_bold.nii. Then, we go to second source which gets also bidsname sub-1_task-A_bold.nii. This means we will in increment_runindex change the names to sub-1_task-A_run-1_ bold.nii and sub-1_task-A run-2 bold.nii. Then, in dcm2niixpostfixes, we will find out that sub-1_task-A_run-2_ bold.nii should actually be sub-1_task-A_part-phase_bold.nii because it has phase postfix.

Every run-1, run-2, etc file should originate from the same scan protocol, meaning that the same output files are generated every time that protocol (= source) is encountered in the for source in sources loop. This means that the runindex needs to be correct within each iteration of that loop, following iterations do not change that or add any information that can or should be used (except of course one case, i.e. when renaming runless to run-1).

Now increment_runindex returns correctly sub-1_task-A_part-phase_ bold.nii thanks to the reset. But what about already renamed sub-1_task-A_run-1_bold.nii? It now has incorrect run-1 added, although run-2 does not exist now and it needs to be runless. So this means we need to rename it back to runless, update scans and also please do not forget about sidecars, they also need to be updated from run-1 back to runless. Code in increment_runindex for # Rename run-less to run-1 can be used, but just the other way around, so Rename run-1 to run-less. But again, sidecars need to be updated for this change - discard run1 and add runless.
This issue will not arise if for the run1, default bidsname was changed with posfixes (e.g echo) – so increment_runindex does not add retroactively run1 and sub-1_task-A_bold.nii stays runless.

I think you will find out that if you test with the latest code all these problems have disappeared :-)

@dzemanov
Copy link
Author

dzemanov commented Feb 5, 2024

I just realized that that scans_table doesn't need any bookkeeping / renaming at all, it is sidecars (now further simplified and renamed once more to bidsnames) that needed proper bookkeeping (after all, scans_table is populated from it). I haven't fully tested things but I think this is really the solution :-)

That is a great observation and the new code is with bidsnames greatly simplified :). I have one question, can it happen that we rename runless file to run1 file incorrectly?
"Every run-1, run-2 should originate from the same scan protocol, following iterations do not change that or add any information."
We have for some datasets in one source subdirectory bold files and in another subdirectory part-phase_bold files for them - maybe our source organization is not standard? When part-phase bold file comes, the code uses the same bidsname as for only the bold file, because it does not know yet about part-phase. That is why it with increment_runindex incorrectly renames bold file from different source loop run to run1. Then it finds out about the phase, so file with part-phase is correctly named without run-index. Problem can be completely fixed with bidsmap that has separate entries for part-phase, since that way it will create bidsname with part-phase in the first place for separate part-phase source subdirectory. We have such bidsmap, so there is no problem for us here. I just wanted to point out that this can happen. Again, if this is not standard, then there is no problem at all :).

@marcelzwiers
Copy link
Collaborator

Yes, there is one edge case that I am working on right now. It's when dcm2niix outputs e.g. fname.nii, fname_e1.nii, fname_e2.nii, etc for echo 1, 2, 3, etc images. Currently both the bidsmap (via <EchoNumbers>) and the suffix produce a different echo number (I think the bidsmap does it correctly). I'll let you know.

I tried to play a bit with bidsprov from the other issue, which in simplified version uses the aggregated variable matched_runs to generate bidsprov, and these changes mean bidsprov will be harder to generate (maybe via custom logger function?).

Yes, I was also thinking of creating a bidsprov logger or a bidsprov class. I think bidsnames serves your purpose, you don't need aggregated matched_runs

p.s. Also note that the bids-validator gives an error if bidsignore scans (e.g. from extra_data) get included in the scans.tsv file

@marcelzwiers
Copy link
Collaborator

Here's an example, perhaps you see something similar too:

ll bids/sub-flatABRIM/extra_data
total 270M
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-1_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-1_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_GR.json
-rw-r--r-- 1 marzwi mrphys 9.8M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_GR.json
-rw-r--r-- 1 marzwi mrphys 9.6M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_GR.json
-rw-r--r-- 1 marzwi mrphys 9.4M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_GR.json
-rw-r--r-- 1 marzwi mrphys 9.3M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_GR.json
-rw-r--r-- 1 marzwi mrphys 9.1M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_GR.json
-rw-r--r-- 1 marzwi mrphys 8.9M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  19M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_GR.json
-rw-r--r-- 1 marzwi mrphys 8.8M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  20M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_GR.json
-rw-r--r-- 1 marzwi mrphys 8.7M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys  20M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_run-1_mod-AlternativeGRE5echosPFshorter_echo-1_GR.json
-rw-r--r-- 1 marzwi mrphys 9.9M Feb  5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_run-1_mod-AlternativeGRE5echosPFshorter_echo-1_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb  5 12:12 sub-flatABRIM_acq-tsevfliso08mmLARGEFOV_T2w.json
-rw-r--r-- 1 marzwi mrphys  16M Feb  5 12:12 sub-flatABRIM_acq-tsevfliso08mmLARGEFOV_T2w.nii.gz

@marcelzwiers
Copy link
Collaborator

Ok, here's a short update. What I said about echo numbers not in sync was not true, and the issue was caused by me bidsmapping data automatically and not setting the mag/phase part. As a remedy I added some handy logic to the dcm2niix2bids plugin to automatically set it for any non-magnitude image. I think it should work but the strange thing is that it doesn't and that there are unwanted aliases introduced (presumably making everything a phase image). I tried to get rid of all aliases when loading a bidsmap, but so far I have been unsuccessful in that. If you have anything to change for the bidsmap_dccn template, I'm happy to hear about it

@dzemanov
Copy link
Author

dzemanov commented Feb 5, 2024

Yes, I think echo numbers are in sync, at least I haven't noticed for our data any differences. The new code looks ok to me. I checked bidsmap_dccn template with our dataset and it added part-phase correctly only to the files with 'P'. I will let you know if I find out something new.

marcelzwiers added a commit that referenced this pull request Feb 6, 2024
@marcelzwiers
Copy link
Collaborator

marcelzwiers commented Feb 6, 2024

Ok, I fixed it, I had vague issues because dict.update() was making shallow copies (I think I had fixed this long ago, it was removed by me with this PR and now I re-fixed it). It was really hard to pinpoint, but from my side, my integration tests are now looking good again :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants