A library for full-stack aging clocks design and benchmarking.
The full release version of the package is currently developing. Only bechmarking module is released and ready for use. Please see below.
You can install the whole library with pip
:
pip install computage
This provides all necessary instruments for aging clocks benchmarking.
A module in the computage
library for epigenetic aging clocks benchmarking. This library is tightly bound with computage_bench
huggingface repository where all DNA methylation data of 66 GSEs from more than 50 studies can be retrieved from. All details on our methodology of epigenetic aging clocks benchmarking and results can be found in the paper.
DNA methylation is a chemical modification of DNA molecules that is present in many biological species, including humans. Specifically, methylation most often occurs at the cytosine nucleotides in a so-called CpG context (cytosine followed by a guanine). This modification is engaged in a variety of cellular events, ranging from nutrient starvation responses to X-chromosome inactivation to transgenerational inheritance. As it turns out, methylation levels per CpG site change systemically in aging, which can be captured by various machine learning (ML) models called aging clocks and used to predict an individual’s age. Moreover, it has been hypothesized that the aging clocks not only predict chronological age, but can also estimate biological age, that is, an overall degree of an individual’s health represented as an increase or decrease of predicted age relative to the general population. However, comparing aging clock performance is no trivial task, as there is no gold standard measure of one’s biological age, so using MAE, Pearson’s r, or other common correlation metrics is not sufficient.
To foster greater advances in the aging clock field, we developed a methodology and a dataset for aging clock benchmarking, ComputAge Bench, which relies on measuring model ability to predict increased ages in samples from patients with pre-defined aging-accelerating conditions (AACs) relative to samples from healthy controls (HC). We highly recommend consulting the Methods and Discussion sections of our paper before proceeding to use this dataset and to build any conclusions upon it.
ComputAgeBench epigenetic clock construction overview.
Suppose you trained brand-new epigenetic aging clocks model using classic scikit-learn
library. You saved your model as pickle
file. Then, the following block of code can be used for benchmarking your model. We also added several other published aging clocks for comparison with yours.
from computage import run_benchmark
#first define NaN imputation method for `in_library` models
#for simlicity here we recommend to use imputation with
#gold standard averages (from R package `sesame`)
imputation = 'sesame_450k'
models_config = {
"in_library":{
'HorvathV1':{'imputation':imputation},
'Hannum':{'imputation':imputation},
'PhenoAgeV2':{'imputation':imputation},
},
#here we should define a name of our new model as well as path
#to the pickle file (.pkl) of the model
"new_models":{
#'my_new_model_name': {'path':/path/to/model.pkl}
}
}
#now run the benchmark
bench = run_benchmark(models_config,
experiment_prefix='my_model_test',
output_folder='./benchmark'
)
#upon completion, the results will be saved in the folder you specified
[...upcoming...]
In case you want just to explore our dataset locally, use the following commands for downloading.
from huggingface_hub import snapshot_download
snapshot_download(
repo_id='computage/computage_bench',
repo_type="dataset",
local_dir='.')
Once downloaded, the dataset can be open with pandas
(or any other parquet
reader).
import pandas as pd
#let's choose a study id, for example `GSE100264`
df = pd.read_parquet('data/computage_bench_data_GSE100264.parquet').T
#note we transpose data for more convenient perception
#Don't forget to explore metadata (which is common for all datasets):
meta = pd.read_csv('computage_bench_meta.tsv', sep='\t', index_col=0)
All results and plots of the ComputAgeBench
paper can be reproduced using this notebook. Alternatively, you can just clone this repository and run the same notebook locally from the notebooks
folder.
[...Table with all clocks...]
If you found this library or corresponding dataset useful in your research, please cite us with the following plain citation or bibtex.
Kriukov, D., Efimov, E., Kuzmina, E. A., Khrameeva, E. E., & Dylov, D. V. (2024). ComputAgeBench: Epigenetic Aging Clocks Benchmark. bioRxiv, 2024-06.
@article{kriukov2024computagebench,
title={ComputAgeBench: Epigenetic Aging Clocks Benchmark},
author={Kriukov, Dmitrii and Efimov, Evgeniy and Kuzmina, Ekaterina A and Khrameeva, Ekaterina E and Dylov, Dmitry V},
journal={bioRxiv},
pages={2024--06},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
For any questions or clarifications, please reach out to: dmitrii.kriukov@skoltech.ru
Please feel free to leave any questions and suggestions in issues, however, if you want a faster and broader discussion, please join to our telegram chat.
We thank the biolearn team for providing inspiration and many useful tools that were helpful during the initial development stage of this library.