This is the codebase for the paper: SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
The reconstruction process of SimMTM involves the following four modules: masking, representation learning, series-wise similarity learning and point-wise reconstruction.
We can easily generate a set of masked series for each sample by randomly masking a portion of time points along the temporal dimension.
After the encoder and projector layer, we can obtain the point-wise representations and series-wise representations.
To precisely reconstruct the original time series, we attempt to utilize the similarities among series-wise representations for weighted aggregation, namely exploiting the local structure of the time series manifold.
Based on the learned series-wise similarities, we aggregate the point-wise representation of its own masked series and other series to reconstruct the original time series.
1、Prepare Data.
All benchmark datasets can be obtained from Google Drive or Tsinghua Cloud, and arrange the folder as:
SimMTM/
|--SimMTM_Forecast
|-- dataset/
|-- ETT-small/
|-- ETTh1.csv
|-- ETTh2.csv
|-- ETTm1.csv
|-- ETTm2.csv
|-- weather/
|-- weather.csv
|-- ...
|-- ...
|--SimMTM_Class
|-- dataset/
|-- SleepEEG/
|-- train.pt
|-- val.pt
|-- test.pt
|-- FD-B/
|-- ...
|-- EMG/
|-- ...
|-- ...
2、Forecasting
We provide the forecasting experiment coding in ./SimMTM_Forecast
and experiment scripts can be found under the folder ./scripts
. To run the code on ETTh2, just run the following command:
cd ./SimMTM_Forecast
# pre-training
sh ./scripts/pretrain/ETT_script/ETTh2.sh
# fine-tuning
sh ./scripts/finetune/ETT_script/ETTh2.sh
3、Classification
We also provide the classification experiment coding in ./SimMTM_Class
. When we want to pre-train a model on SleepEEG and fine-tune it on Epilepsy, please run:
cd ./SimMTM_Class
python ./code/main.py --training_mode pre_train --pretrain_dataset SleepEEG --target_dataset Epilepsy
4、We also provide some checkpoints and you can tune them directly on target datasets.
SimMTM (marked by red stars) can simultaneously cover high-level and low-level tasks for in- and cross-domain settings and outperforms other baselines significantly, highlighting the advantages of SimMTM in task generality. More results can be found in our paper.
If you find this repo useful, please cite our paper.
@inproceedings{dong2023simmtm,
title={SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling},
author={Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang and Mingsheng Long},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
If you have any questions, please contact djx20@mails.tsinghua.edu.cn.
We appreciate the following github repos a lot for their valuable code base or datasets:
https://github.com/thuml/Time-Series-Library
https://github.com/mims-harvard/TFC-pretraining/tree/main
Thanks to vincentsham for reproducing our code.