Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation .gl, .gp, .phased files #40

Open
rafaella-buzatu opened this issue Feb 13, 2024 · 1 comment
Open

Explanation .gl, .gp, .phased files #40

rafaella-buzatu opened this issue Feb 13, 2024 · 1 comment

Comments

@rafaella-buzatu
Copy link

Hello! While running Monopogen, I noticed that it outputs quite a number of different files. I have read in your tutorial that the final output should be in the .phased.vcf.gz file, however that file only provides the genotype. I wanted to also obtain information about the read depth and allele frequency for those variants. I find that the .gl.vcf.gz file contains information about the depth, while the .gp.vcf.gz contains the genotype and allele frequency. I have also noticed that the .gl.vcf file contains unfiltered variants, while the .gp.vcf seems to contain only filtered variants that are the same as in .phased.vcf.

Could you help me understand what all these files are and how I could go about extracting as much information as possible for all variants (even unfiltered) from them?

Thank you!

@jinzhuangdou
Copy link
Collaborator

Hi, you can find more details in the beagle software manual https://faculty.washington.edu/browning/beagle/beagle_4.1_21Jan17.pdf.
Briefly, *.gl includes all candidate SNVs with alignment information available
*.gp.vcf.gz includes genotypes of SNVs overlapped with 1KG3 after LD refinement
*.phased.vcf is similar with gp.vcf.gz but with phasing information available.
Hope this is helpful for your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants