1st place solution of Kaggle Open Problems - Multimodal Single-Cell Integration
Install the solution code.
pip3 install -e .
In addtion, download the following data
- Open Problems - Multimodal Single-Cell Integration data set from Kaggle
- tab separated hgnc_complete_set file from https://www.genenames.org/download/archive/
- Reactome Pathways Gene Set from https://reactome.org/download-data
compress kaggle dataset and make addtional data to use in training
export DATA_DIR=/path/to/kaggle/dataset/Directory
python3 script/make_compressed_dataset.py --data_dir ${DATA_DIR}
python3 script/make_additional_files.py --data_dir ${DATA_DIR}
python3 script/make_cite_input_mask.py --data_dir ${DATA_DIR} --hgnc_complete_set_path /path/to/hgnc_complete_set --reactome_pathways_path /path/to/reactome_pathways
python3 scripts/train_mode.py --data_dir ${DATA_DIR} --task_type multi
python3 scripts/train_mode.py --data_dir ${DATA_DIR} --task_type cite