Skip to content

Genomic Variants search API, working from the VCF format.

License

Notifications You must be signed in to change notification settings

bento-platform/gohan

Repository files navigation

Gohan - A Genomic Variants API

bowl-of-rice

Prerequisites


TL;DR

Typical use-case walkthrough

  # environment
  cp ./etc/example.env .env # modify to your needs

  # kickstart dockerized gohan environment
  make init

  # (optional): if you plan on modifying the api codebase before deploying
  make init-dev

  # gateway & certificates
  mkdir -p gateway/certs/dev

  openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
  openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt


  # build services
  make build-gateway 
  make build-api 

  # run services
  make run-gateway
  make run-elasticsearch
  make run-drs
  make run-api
  
  
  # initiate genes catlogue:
  curl -k https://gohan.local/genes/ingestion/run
  
  # monitor progress:
  curl -k https://gohan.local/genes/ingestion/requests
  curl -k https://gohan.local/genes/ingestion/stats

  # view catalogue
  curl -k https://gohan.local/genes/overview

  # move vcf.gz files to `$GOHAN_API_VCF_PATH`

  # ingest vcf.gz
  curl -k https://gohan.local/variants/ingestion/run\?fileNames=<filename>\&assemblyId=GRCh37\&filterOutReferences=true\&dataset=00000000-0000-0000-0000-000000000000
  
  # monitor progress:
  curl -k https://gohan.local/variants/ingestion/requests
  curl -k https://gohan.local/variants/ingestion/stats

  # view variants
  curl -k https://gohan.local/variants/overview

Getting started

Environment :

First, from the project root, create a local file for environment variables with default settings by running

cp ./etc/example.env .env

and make any necessary changes, such as the Elasticsearch GOHAN_ES_USERNAME and GOHAN_ES_PASSWORD when in production.

note: a known current bug is that GOHAN_ES_USERNAME must remain its default..


Initialization

Run

make init

Elasticsearch & Kibana :

Run

make run-elasticsearch 

and (optionally)

make run-kibana

DRS :

Run

make run-drs

Data Access Authorization with OPA (more on this to come..) :

Run

make build-authz
make run-authz


Development

architecture

Gateway

To create and use development certs from the project root, run

mkdir -p gateway/certs/dev

openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt

Note: Ensure your CN matches the hostname (gohan.local by default)

These will be incorporated into the Gateway service (using NGINX by default, see gateway/Dockerfile and gateway/nginx.conf for details). Be sure to update your local /etc/hosts (on Linux) or C:/System32/drivers/etc/hosts (on Windows) file with the name of your choice.

Next, run

make build-gateway
make run-gateway

API

Containerized :

 To simply run a working instance of the api "out of the box", build the docker image and spawn the container with an fresh binary build by running

make build-api
make run-api

 and the docker-compose.yaml file will handle the configuration.


Local Development :

 This can be done multiple ways.

  1. Terminal : From the project root, run
# load variables from local file
set -a
. ./.env
set +a

cd src/api

go run .
  1. IDE (preferably VSCode)
- follow the recommended instructions listed at https://code.visualstudio.com/docs/languages/go

- configure the `.vscode/launch.json` to inject the above mentioned variables as recommended by https://stackoverflow.com/questions/29971572/how-do-i-add-environment-variables-to-launch-json-in-vscode

- click 'Run & Debug' > "Play" 

Local Release

 To build / test from source;

make build-api-local-binaries

 The binary can then be found at bin/api_${GOOS}_${GOARCH} and executed locally with

# load variables from local file
set -a
. ./.env
set +a

# navigate to binary directory
cd bin/

# execute binary
./api_${GOOS}_${GOARCH}

Endpoints :

/variants

Request

  GET /variants/overview
   params: none


Response

{
    "chromosomes": {
        "<CHROMOSOME>": `number`,
        ...
    },
    "sampleIDs": {
        "<SAMPLEID>": `number`,
        ...
    },
    "variantIDs": {
        "<VARIANTID>": `number`,
        ...
    }
}

Example :

{
    "chromosomes": {
        "21": 90548
    },
    "sampleIDs": {
        "hg00096": 33664,
        "hg00099": 31227,
        "hg00111": 25657
    },
    "variantIDs": {
        ".": 90548
    }
}


Requests

  GET /variants/get/by/variantId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele ( "A" | "C" | "G" | "T" | "N" or some combination thereof )
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )
  • getSampleIdsOnly : bool (optional) - default: false

  GET /variants/count/by/variantId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/get/by/sampleId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/count/by/sampleId
   params:

  • chromosome : string ( 1-23, X, Y, MT )
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

Generalized Response Body Structure

{
    "status":  `number` (200 - 500),
    "message": `string` ("Success" | "Error"),
    "results": [
        {
            "query":  `string`,       // reflective of the type of id queried for, i.e 'variantId:abc123', or 'sampleId:HG0001
            "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),    // reflective of the assembly id queried for
            "count":  `number`,   // this field is only present when performing a COUNT query
            "start":  `number`,   // reflective of the provided lowerBound parameter, 0 if none
            "end":  `number`,     // reflective of the provided upperBound parameter, 0 if none
            "chromosome":  `string`,       // reflective of the chromosome queried for
            "calls": [            // this field is only present when performing a GET query
                {
                   "id": `string`, // variantId
                   "chrom":  `string`,
                   "pos": `number`,
                   "ref": `[]string`,  // list of alleles
                   "alt": `[]string`,  // list of alleles
                   "alleles": `[]string`,  // ordereed list of alleles
                   "info": [
                       {
                           "id": `string`,
                           "value": `string`,
                       },
                       ...
                   ],
                   "format":`string`,
                   "qual": `number`,
                   "filter": `string`,
                   "sampleId": `string`,
                   "genotype_type": `string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )`,
                   "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),
                },
                ...
            ]
        },
    ]
}

Examples :





Request

  GET /variants/ingestion/run
   params:

  • filename : string (required)

Response

{
    "state":  `number` ("Queuing" | "Running" | "Done" | "Error"),
    "id": `string`,
    "filename": `string`,
    "message": `string`,
}


Request

  GET /variants/ingestion/requests
   params: none


Response

[
  {
    "state":  `number` ("Queuing" | "Running" | "Done" | "Error"),
    "id": `string`,
    "filename": `string`,
    "message": `string`,
    "createdAt": `timestamp string`,
    "updatedAt": `timestamp string`
  },
  ...
]


Deployments :

All in all, run

make run-elasticsearch 
make run-drs
make build-gateway && make run-gateway 
make build-api && make run-api

# and optionally
make run-kibana

For other handy tools, see the Makefile. Among those already mentionned here, you'll find other build, run, stop and clean-up commands.


Tests :

Once elasticsearch, drs, the api, and the gateway are up, run

make test-api-dev

Dev Container debug

Interactive debug in VSCode is only possible When using the development image of gohan-api.

Using the "Attach to PID(Bento)" debug config, select the PID associated with the following path:

/gohan-api/src/api/tmp/main