Skip to content
This repository has been archived by the owner on Dec 8, 2020. It is now read-only.

Relevance of the outputs #3

Open
lecorguille opened this issue Feb 15, 2017 · 1 comment
Open

Relevance of the outputs #3

lecorguille opened this issue Feb 15, 2017 · 1 comment

Comments

@lecorguille
Copy link
Member

@melpetera @mmonsoor

(For the siff export see: #2)

For information, my inputs with my little test dataset (4 samples from faahKO):

  • faahKO.xset.group.retcor.group.fillpeaks.annotate.variableMetadata.tsv: 1.4M
  • faahKO.xset.group.retcor.group.fillpeaks.annotate.dataMatrix.tsv: 416K
  • faahKO.xset.group.retcor.group.fillpeaks.annotate.sampleMetadata.tsv: 41

About the others, we have with my little test dataset (4 samples from faahKO):

  • correlation_matrix_selected.tsv: 781M
  • correlation_matrix.tsv: 1.2G
  • selected_metabolites_transpo.tsv: 331K
  • sorted_table.tsv: 1.8M

The mapping between inputs and outputs is:

  • sorted_table.tsv <-> variableMetadata + dataMatrix + 2 columns
  • selected_metabolites_transpo.tsv <-> dataMatrix with only the selected lines

From my point of view, this tool should generate 2 files:

  • dataMatrix with only the selected lines
  • variableMetadata with only the selected lines with a couple of columns if needed

Should we keep the correlation matrix? They are huge!

@melpetera
Copy link
Member

Concerning the ouputs, here are my advices, arisen from discussions with users.

  • The correlation matrix is need, but the filtered one only (correlation_matrix_selected.tsv); you can always get the "full" one by replaying the analysis with "all to 1" parameters.
  • Appart from the correlation matrix, outputs should not be filtered: adding tag columns in a variableMetadata output is enough (filter can then be obtained with Generic filter tool). In particular, users may want to check what was selected or not, which is difficult when having the filtered table instead of the whole table + tag.

Thus , what we suggest as outputs is the folowing:

  • the whole variableMetadata, sorted, with the additional columns "signal_moy" and "suppress" at the end (but not the dataMatrix intensities in)
  • the filtered correlation matrix (correlation_matrix_selected.tsv)
  • the sif file (sif_table.tsv)

And that is all. The other files can be obtained by replaying or combining outputs with other tools (Generic_filter, Table_merge, Transpose).

What do you think about it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants