Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is the log file #90

Open
pavlo888 opened this issue May 25, 2022 · 3 comments
Open

where is the log file #90

pavlo888 opened this issue May 25, 2022 · 3 comments
Assignees

Comments

@pavlo888
Copy link

Dear,

First off, great package!!!!

Now, I managed to successfully run the pipeline with the following command
phylophlan -i Selected_genomes_pangenome_20200522 -o output-228-apr2022 -d uniref90_At --trim greedy --not_variant_threshold 0.99 --remove_fragmentary_entries --fragmentary_threshold 0.67 --min_num_entries 135 -t a -f isolates_config.cfg --diversity low --force_nucleotides --nproc 2 --verbose 2>&1 | tee logs/phylophlan__output-228-apr2022.log

However, now I cannot find the log file. Could you please indicate me where I can find it? I would like to extract some information about the pipeline like how big are the concatemers used in the MSAs.

Thanks!

Cheers,
Pablo

@fasnicar fasnicar self-assigned this May 26, 2022
@fasnicar
Copy link
Collaborator

Hello Pablo,

Apologies, but I'm not sure to which log you're referring.
You already put the --verbose and saved the output from PhyloPhlAn to the log file logs/phylophlan__output-228-apr2022.log.
Within the output folder (output-228-apr2022) you'll find a tmp folder that contains all intermediate steps, so maybe that's what you need if you like to compute some stats about the single MSA?

Many thanks,
Francesco

@pavlo888
Copy link
Author

Hi @fasnicar

Thanks a lot for your reply! I understand a bit better now. I thought that the log would be a single file with information from the run. But now it is clearer for me.

Indeed, I saw the different files from the intermediate steps.

Three follow-up question would be:
i) The number of files in the "markers" folder represents the total number of markers detected?
ii) In the concatenated.aln.reduced file, the first line says "227 2217968", does that mean that there are 227 genomes and 2 217 968 bp in total for the complete alignment? That would mean that each genome would have 2217968/227=9770.78 bp?
iii) In the info.refined.tre file, the "Alignment patterns: 1041319" information, what does it mean exactly?

Thanks a lot for your help!!!!

Cheers,
Pablo

@fasnicar
Copy link
Collaborator

Hi Pablo,

To answer your questions:

i) The number of files in the "markers" folder represents the total number of markers detected?

Yes, that would be the number of markers detected. Although these might not be the same number of markers used for building the tree. If you specified a trimming approach (as you did in your command with the param --trim greedy), then some markers might be discarded during the trimming phase. The trimming steps are done in folders whose name starts with trim_, so you want to check the latest to get the actual number of markers used in the tree.

ii) In the concatenated.aln.reduced file, the first line says "227 2217968", does that mean that there are 227 genomes and 2 217 968 bp in total for the complete alignment? That would mean that each genome would have 2217968/227=9770.78 bp?

The concatenated.aln.reduced file is produced by RAxML when identical entries are detected. The 2217968 is the MSA length, meaning that each of the 227 genomes has that many positions aligned.

iii) In the info.refined.tre file, the "Alignment patterns: 1041319" information, what does it mean exactly?

Those are the unique patterns that RAxML found in the MSA. This, most of the time can be similar to the alignment length, in your case about half, meaning that there are some patterns that are repeated. You can understand more about these aspects by referring to the RAxML manual.

I hope these help.

Many thanks,
Francesco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants