This repo contains code and models for the LineEX system, (link paper), which extracts data from scientific line charts. We adapt existing vision transformers and pose detection methods and showcase significant performance gains over existing SOTA baselines. We also propose a new loss function and present its effectiveness against existing loss functions.
The LineEX pipeline consists of three modular stages, which can be used independent from each other. They are :
- Keypoint Extraction
- Chart Element Detection and Text Extraction
- Keypoint Grouping, Legend Mapping and Datapoint Scaling
git clone https://github.com/Shiva-sankaran/LineEX.git
cd LineEX
conda env create -f environment.yml
conda activate LineEX
Weights and data will be placed at the correct folders
Set corresponding DATA_flag(True/False) to download a particular data set.
chmod +x download.sh
./download.sh -T False -V False -L True # To download only the test data
UPDATE: Dataset moved to here.
UPDATE: Weights can be found here
Each of the modules can be used separately, or the entire pipeline can be called at once to extract the desired information. Output is stored in the corresponding directory
python pipeline.py --input_path = sample_input/
cd modules/KP_detection
python run.py
cd modules/CE_detection
python run.py
Refer to the paper for more information about the metrics
Overall metrics is essentially the metric for grouping and legend mapping
cd modules/Grouping_legend_mapping
python eval.py
cd modules/KP_detection
python eval.py
cd modules/CE_detection
python run.py
cd modules/KP_detection
python -m torch.distributed.launch --nproc_per_node=3 --node_rank=0 train.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth --alpha 0.99
cd modules/CE_detection
python -m torch.distributed.launch train.py --coco_path path_to_data
Need to change data paths
Shivasankaran, V. P., Muhammad Yusuf Hassan, and Mayank Singh. "LineEX: Data Extraction from Scientific Line Charts." 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2023.