Skip to content
Arto Bendiken edited this page Sep 18, 2015 · 22 revisions

VCF to RDF Mapping

The team worked to produce software making available Variant Call Format (VCF) files as linked data, in order to facilitate offline batch conversion of VCF to various RDF formats as well as to enable online SPARQL query access to compressed VCF files directly.

BioHackathon 2015 Team

Software Produced

Mapping

There are currently two distinct mappings, both which are planned to be supported in the RDF::VCF software:

Examples

Command-Line Interface (CLI)

The CLI utility called vcf2rdf transforms VCF files into RDF (currently outputting N-Triples):

vcf2rdf Homo_sapiens.1.vcf.gz Homo_sapiens.2.vcf.gz ...

The input files can be either plain-text VCF or compressed by bgzip (as in the above example).

Application Programming Interface (API)

The RDF::VCF gem can be used like any other RDF.rb reader plugin:

# Load the RDF::VCF library:
require 'rdf/vcf'

# Open a VCF file for reading:
RDF::VCF::Reader.open('Homo_sapiens.vcf.gz') do |reader|

  # Loop over all generated RDF statements:
  reader.each_statement do |statement|

    # Print the RDF statement to the screen:
    $stdout.puts statement.inspect
  end
end

SPARQL Query Interface

SELECT ?s ?quality WHERE {
  ?s faldo:location ?location .
  ?location faldo:reference [ dc:identifier "Y" ] .
  ?location faldo:begin [ faldo:position ?begin ] .
  ?location faldo:end [ faldo:position ?end ] .
  ?s vcf:quality ?quality .
  FILTER(?begin >= 2749180)
  FILTER(?end <= 2755180)
}