-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COSMIC NEWER VERSION SUPPORT(v94 and later) #3698
Comments
I looked through the python script to prepare cosmic.vcf.gz for hg38 manually. Interestingly, I found the problem arises at the CMD So my queston is the step |
That command excludes the known SNPs (non-tumor) from the COSMIC data. I believe they don't ship these variants anymore, though. |
Yeah , so I skipped this step as well. I manually prepared vcf files of CodingVariants and NonCodingVariants separately. This time, the problem arises when I tried to merge the above two vcf files with error indicating that the contigs of them are not compatible. It is strange that even if I normalized according to the same reference genome, the contigs of NonCodingVcf is ordered as "chr1, chr2, chr3, chr 4....." while the CodingVcf is ordered as "chr1, chr10, chr11, chr12....chr2, chr22..." |
I cannot figure out what is wrong |
Did you try reordering the contigs so they all have the same order? |
Thanks for your advice, I may try this later. But it still confused me how come different orders are given after the same operation(exactly the same gunzip-normalize-gsort process as is shown below)
|
Version info
To Reproduce
Exact bcbio command you have used:
Log files (could be found in work/log)
`2023-02-13 14:22:35,700 [I] Beginning COSMIC v94 prep for GRCh37.
2023-02-13 14:22:35,700 [I] Beginning COSMIC v94 prep for GRCh38.
2023-02-13 14:22:35,700 [I] Downloading COSMIC VCF files.
2023-02-13 14:22:35,700 [I] Downloading https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v94/VCF/CosmicCodingMuts.vcf.gz
2023-02-13 14:25:13,610 [I] Downloading https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v94/VCF/CosmicNonCodingVariants.vcf.gz
2023-02-13 14:26:18,927 [I] Sorting v94/GRCh38/CosmicCodingMuts.vcf.gz to match the order of /home/data/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa.
********** For more information, please see:
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
********** The command line looks like this in the new syntax:
********** MergeVcfs -O v94/bcbio_ready/hg38/cosmic.vcf.gz -D /home/data/bcbio/genomes/Hsapiens/hg38/seq/hg38.dict -I v94/GRCh38/CosmicCodingMuts-prep.vcf.gz -I v94/GRCh38/CosmicNonCodingVariants-prep.vcf.gz -USE_JDK_DEFLATER true -USE_JDK_INFLATER true -CREATE_INDEX false
[Mon Feb 13 14:26:19 CST 2023] MergeVcfs INPUT=[v94/GRCh38/CosmicCodingMuts-prep.vcf.gz, v94/GRCh38/CosmicNonCodingVariants-prep.vcf.gz] OUTPUT=v94/bcbio_ready/hg38/cosmic.vcf.gz SEQUENCE_DICTIONARY=/home/data/bcbio/genomes/Hsapiens/hg38/seq/hg38.dict CREATE_INDEX=false USE_JDK_DEFLATER=true USE_JDK_INFLATER=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Feb 13 14:26:19 CST 2023] Executing as bcbio@dell-PowerEdge-R7525 on Linux 5.15.0-46-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_332-b09; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.27.5
[Mon Feb 13 14:26:19 CST 2023] picard.vcf.MergeVcfs done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: file:///home/data/bcbio/cosmic-prep/v94/GRCh38/CosmicCodingMuts-prep.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97)
at htsjdk.tribble.TabixFeatureReader.(TabixFeatureReader.java:82)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81)
at htsjdk.variant.vcf.VCFFileReader.(VCFFileReader.java:145)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:182)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:309)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)
... 8 more
Traceback (most recent call last):
File "prepare_cosmic.py", line 246, in
main(args.cosmic_version, args.bcbio_directory, args.overwrite, args.clean)
File "prepare_cosmic.py", line 62, in main
ready_cosmic = combine_cosmic(sorted_inputs, bcbio_ref, out_file)
File "prepare_cosmic.py", line 153, in combine_cosmic
subprocess.check_call(cmd)
File "/home/data/bcbio/anaconda/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['picard', 'MergeVcfs', 'O=v94/bcbio_ready/hg38/cosmic.vcf.gz', 'D=/home/data/bcbio/genomes/Hsapiens/hg38/seq/hg38.dict', 'I=v94/GRCh38/CosmicCodingMuts-prep.vcf.gz', 'I=v94/GRCh38/CosmicNonCodingVariants-prep.vcf.gz', 'USE_JDK_DEFLATER=true', 'USE_JDK_INFLATER=true', 'CREATE_INDEX=false']' returned non-zero exit status 1.
`
I have also tried COSMIC version 93 and earlier versions such as v90 , but it seems the VCF files are not available anymore for these versions. Furthermore, for version from v94 on, though the vcf files could be downloaded successfully, the following process always halted because of error:
Contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.)
I assume format changes occurred to these newer versions of vcf files. Could you please update the bcbio_nextgen.py script so as to avoid such bugs? Thanks ~
The text was updated successfully, but these errors were encountered: