Evaluating existing SCT models #34

plbenveniste · 2024-10-14T19:18:22Z

This issue reports the work done to evaluate the existing models.

The existing models are the following:

sct_deepseg_lesion
sct_deepseg -t seg_sc_ms_lesion_stir_psir
sct_deepseg -t seg_ms_lesion_mp2rage

plbenveniste · 2024-10-14T20:52:06Z

I created the file evaluation/test_sct_models.py to evaluate the predictions of the 3 models for lesion seg in SCT.

It computes dice score, lesion ppv, lesion sensitivity and lesion f1 score.

It is currently running to evaluate it on th test set using:

python evaluation/test_sct_models.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_output

plbenveniste · 2024-10-15T18:23:26Z

Because the initial code was taking too long to compute (aroung 90h), I decided to split it into 3 files:

python evaluation/test_sct_deepseg_lesion.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion
python evaluation/test_sct_deepseg_psir_stir.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir
python evaluation/test_sct_deepseg_mp2rage.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage

plbenveniste · 2024-10-15T18:25:28Z

For the sct_deepseg_lesion model

I then plotted the desired curves using:

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.0068 ± 0.0098
STIR (n=11): 0.3676 ± 0.2831
T2star (n=83): 0.5117 ± 0.2076
T2w (n=358): 0.3206 ± 0.2679
UNIT1 (n=57): 0.0070 ± 0.0084

Here is the output for the other metrics

PPV score per contrast (mean ± std)
PSIR (n=60): 0.0222 ± 0.1354
STIR (n=11): 0.4864 ± 0.4037
T2star (n=83): 0.6010 ± 0.2895
T2w (n=358): 0.6079 ± 0.4153
UNIT1 (n=57): 0.0097 ± 0.0526

F1 score per contrast (mean ± std)
PSIR (n=60): 0.0077 ± 0.0441
STIR (n=11): 0.4037 ± 0.3222
T2star (n=83): 0.6396 ± 0.2281
T2w (n=358): 0.5059 ± 0.3690
UNIT1 (n=57): 0.0088 ± 0.0464

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.0395 ± 0.1839
STIR (n=11): 0.4500 ± 0.3738
T2star (n=83): 0.8102 ± 0.2478
T2w (n=358): 0.5221 ± 0.4007
UNIT1 (n=57): 0.0085 ± 0.0458

For the MP2RAGE model

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.2135 ± 0.1760
STIR (n=11): 0.0110 ± 0.0126
T2star (n=83): 0.0074 ± 0.0223
T2w (n=358): 0.0067 ± 0.0127
UNIT1 (n=57): 0.4549 ± 0.1944

Output for the other metrics:

PPV score per contrast (mean ± std)
PSIR (n=60): 0.3733 ± 0.2918
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.1425 ± 0.3500
UNIT1 (n=57): 0.3298 ± 0.1770

F1 score per contrast (mean ± std)
PSIR (n=60): 0.3943 ± 0.2621
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.4422 ± 0.1937

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.5506 ± 0.3480
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.8224 ± 0.2470

For the PSIR and STIR model

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.5701 ± 0.2660
STIR (n=11): 0.5984 ± 0.2237
T2star (n=83): 0.1312 ± 0.1538
T2w (n=358): 0.2213 ± 0.2134
UNIT1 (n=57): 0.0023 ± 0.0016

For the other metrics:

PPV score per contrast (mean ± std)
PSIR (n=60): 0.6672 ± 0.3478
STIR (n=11): 0.6605 ± 0.3430
T2star (n=83): 0.1235 ± 0.1475
T2w (n=358): 0.4306 ± 0.4165
UNIT1 (n=57): 0.0000 ± 0.0000

F1 score per contrast (mean ± std)
PSIR (n=60): 0.6381 ± 0.3240
STIR (n=11): 0.6494 ± 0.2915
T2star (n=83): 0.1815 ± 0.1940
T2w (n=358): 0.3392 ± 0.3560
UNIT1 (n=57): 0.0000 ± 0.0000

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.7138 ± 0.3415
STIR (n=11): 0.7462 ± 0.3294
T2star (n=83): 0.4796 ± 0.4512
T2w (n=358): 0.5556 ± 0.4181
UNIT1 (n=57): 0.0000 ± 0.0000

plbenveniste · 2024-10-17T00:02:41Z

I then evaluated the SCT models for segmenting spinal lesions on the external testing set (ms-basel-2018 and ms-basel-2020).

For sct_deepseg_lesion

I rand the following command:

python evaluation/test_sct_lesion_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0046 ± 0.0114
T1w (n=22): 0.0673 ± 0.2120
T2w (n=24): 0.3272 ± 0.3372

Here is the output for the other metrics

PPV score per contrast (mean ± std)
PD (n=31): 0.0613 ± 0.2076
T1w (n=22): 0.1136 ± 0.3060
T2w (n=24): 0.3993 ± 0.3877

F1 score per contrast (mean ± std)
PD (n=31): 0.0189 ± 0.0651
T1w (n=22): 0.0657 ± 0.2186
T2w (n=24): 0.4000 ± 0.3717

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0130 ± 0.0461
T1w (n=22): 0.2849 ± 0.4499

##For sct_deepseg mp2rage

I ran the following command:

python evaluation/test_sct_mp2rage_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0034 ± 0.0118
T1w (n=22): 0.0559 ± 0.2116
T2w (n=24): 0.2864 ± 0.4308

Here is the output for the other metrics

PPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423

F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558

For sct_deepseg psir-stir

I ran the following command

python evaluation/test_sct_psir-stir_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0036 ± 0.0119
T1w (n=22): 0.2774 ± 0.4529
T2w (n=24): 0.2510 ± 0.3996

Here is the output for the other metrics

PPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2792 ± 0.4128

F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2812 ± 0.4154

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558

plbenveniste self-assigned this Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating existing SCT models #34

Evaluating existing SCT models #34

plbenveniste commented Oct 14, 2024

plbenveniste commented Oct 14, 2024 •

edited

Loading

plbenveniste commented Oct 15, 2024

plbenveniste commented Oct 15, 2024 •

edited

Loading

plbenveniste commented Oct 17, 2024 •

edited

Loading

Evaluating existing SCT models #34

Evaluating existing SCT models #34

Comments

plbenveniste commented Oct 14, 2024

plbenveniste commented Oct 14, 2024 • edited Loading

plbenveniste commented Oct 15, 2024

plbenveniste commented Oct 15, 2024 • edited Loading

For the sct_deepseg_lesion model

For the MP2RAGE model

For the PSIR and STIR model

plbenveniste commented Oct 17, 2024 • edited Loading

For sct_deepseg_lesion

For sct_deepseg psir-stir

plbenveniste commented Oct 14, 2024 •

edited

Loading

plbenveniste commented Oct 15, 2024 •

edited

Loading

plbenveniste commented Oct 17, 2024 •

edited

Loading