MAGE: Multi-ancestry Analysis of Gene Expression

MAGE comprises RNA-seq data from lymphoblastoid cell lines derived from 731 individuals from the 1000 Genomes Project (1KGP), representing 26 globally-distributed populations across five continental groups. These data offer a large, geographically diverse, open access resource to facilitate studies of the distribution, genetic underpinnings, and evolution of variation in human transcriptomes and include data from several ancestry groups that were poorly represented in previous studies.

Data Access

Raw reads

Newly generated RNA sequencing data for the 731 individuals (779 total libraries) is available on the Sequence Read Archive (Accession: PRJNA851328).

Processed data

Processed gene expression matrices and QTL mapping results (as well as a host of other downstream data) are currently available on Zenodo (MAGEv1.0 Zenodo link) as well as Dropbox (MAGEv1.0 Dropbox link).

Briefly, this repo contains the following data:

Sample metadata and sequencing metrics
Gene expression and splicing matrices used for e/sQTL mapping and analyses of global trends of expression/splicing diversity
cis-e/sQTL mapping results, including aFC estimates for cis-eQTLs
Functional annotations of cis-e/sQTLs
Results of colocalization analysis between MAGE e/sQTLs and complex trait GWAS from the PAGE study
Results of analyses of global trends of expression/splicing diversity
Jointly-generated top genotype PCs for samples in MAGE and other resources with paired WGS/RNA-seq data (Geuvadis, GTEx, AFGR)

READMEs are provided for all data in the repo.

If you are having trouble accessing these data, please feel free to contact us to explore other options (e.g., Globus).

Variant calls

The high-coverage variant calls used for QTL mapping were previously generated by the New York Genome Center (NYGC) and are available through the 1KGP FTP site.

Code

Code used for data processing and downstream analyses is made available in the analysis_pipeline/ directory, along with READMEs describing how each script is run.

Code used to produce major figures/panels in the manuscript is made available in the figure_generation/ directory.

The MAGE manuscript

For more information about the MAGE resource as well as analyses performed using this resource, please see our paper:

Sources of gene expression variation in a globally diverse human cohort
Dylan J. Taylor, Surya B. Chhetri, Michael G. Tassia, Arjun Biddanda, Stephanie M. Yan, Genevieve L. Wojcik, Alexis Battle, Rajiv C. McCoy

Citing MAGE

If you use MAGE data in your own work, please cite the paper linked above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MAGE: Multi-ancestry Analysis of Gene Expression

Data Access

Raw reads

Processed data

Variant calls

Code

The MAGE manuscript

Citing MAGE

Files

README.md

Latest commit

History

README.md

File metadata and controls

MAGE: Multi-ancestry Analysis of Gene Expression

Data Access

Raw reads

Processed data

Variant calls

Code

The MAGE manuscript

Citing MAGE