Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating existing SCT models #34

Open
plbenveniste opened this issue Oct 14, 2024 · 4 comments
Open

Evaluating existing SCT models #34

plbenveniste opened this issue Oct 14, 2024 · 4 comments
Assignees

Comments

@plbenveniste
Copy link
Collaborator

This issue reports the work done to evaluate the existing models.

The existing models are the following:

  • sct_deepseg_lesion
  • sct_deepseg -t seg_sc_ms_lesion_stir_psir
  • sct_deepseg -t seg_ms_lesion_mp2rage
@plbenveniste plbenveniste self-assigned this Oct 14, 2024
@plbenveniste
Copy link
Collaborator Author

plbenveniste commented Oct 14, 2024

I created the file evaluation/test_sct_models.py to evaluate the predictions of the 3 models for lesion seg in SCT.

It computes dice score, lesion ppv, lesion sensitivity and lesion f1 score.

It is currently running to evaluate it on th test set using:

python evaluation/test_sct_models.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_output

@plbenveniste
Copy link
Collaborator Author

Because the initial code was taking too long to compute (aroung 90h), I decided to split it into 3 files:

python evaluation/test_sct_deepseg_lesion.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion
python evaluation/test_sct_deepseg_psir_stir.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir
python evaluation/test_sct_deepseg_mp2rage.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage

@plbenveniste
Copy link
Collaborator Author

plbenveniste commented Oct 15, 2024

For the sct_deepseg_lesion model

I then plotted the desired curves using:

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.0068 ± 0.0098
STIR (n=11): 0.3676 ± 0.2831
T2star (n=83): 0.5117 ± 0.2076
T2w (n=358): 0.3206 ± 0.2679
UNIT1 (n=57): 0.0070 ± 0.0084

dice_scores_contrast

Here is the output for the other metrics
PPV score per contrast (mean ± std)
PSIR (n=60): 0.0222 ± 0.1354
STIR (n=11): 0.4864 ± 0.4037
T2star (n=83): 0.6010 ± 0.2895
T2w (n=358): 0.6079 ± 0.4153
UNIT1 (n=57): 0.0097 ± 0.0526

F1 score per contrast (mean ± std)
PSIR (n=60): 0.0077 ± 0.0441
STIR (n=11): 0.4037 ± 0.3222
T2star (n=83): 0.6396 ± 0.2281
T2w (n=358): 0.5059 ± 0.3690
UNIT1 (n=57): 0.0088 ± 0.0464

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.0395 ± 0.1839
STIR (n=11): 0.4500 ± 0.3738
T2star (n=83): 0.8102 ± 0.2478
T2w (n=358): 0.5221 ± 0.4007
UNIT1 (n=57): 0.0085 ± 0.0458

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

For the MP2RAGE model

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.2135 ± 0.1760
STIR (n=11): 0.0110 ± 0.0126
T2star (n=83): 0.0074 ± 0.0223
T2w (n=358): 0.0067 ± 0.0127
UNIT1 (n=57): 0.4549 ± 0.1944

dice_scores_contrast

Output for the other metrics:
PPV score per contrast (mean ± std)
PSIR (n=60): 0.3733 ± 0.2918
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.1425 ± 0.3500
UNIT1 (n=57): 0.3298 ± 0.1770

F1 score per contrast (mean ± std)
PSIR (n=60): 0.3943 ± 0.2621
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.4422 ± 0.1937

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.5506 ± 0.3480
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.8224 ± 0.2470

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

For the PSIR and STIR model

python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json  --split test

Output:

Dice score per contrast (mean ± std)
PSIR (n=60): 0.5701 ± 0.2660
STIR (n=11): 0.5984 ± 0.2237
T2star (n=83): 0.1312 ± 0.1538
T2w (n=358): 0.2213 ± 0.2134
UNIT1 (n=57): 0.0023 ± 0.0016

dice_scores_contrast

For the other metrics:
PPV score per contrast (mean ± std)
PSIR (n=60): 0.6672 ± 0.3478
STIR (n=11): 0.6605 ± 0.3430
T2star (n=83): 0.1235 ± 0.1475
T2w (n=358): 0.4306 ± 0.4165
UNIT1 (n=57): 0.0000 ± 0.0000

F1 score per contrast (mean ± std)
PSIR (n=60): 0.6381 ± 0.3240
STIR (n=11): 0.6494 ± 0.2915
T2star (n=83): 0.1815 ± 0.1940
T2w (n=358): 0.3392 ± 0.3560
UNIT1 (n=57): 0.0000 ± 0.0000

Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.7138 ± 0.3415
STIR (n=11): 0.7462 ± 0.3294
T2star (n=83): 0.4796 ± 0.4512
T2w (n=358): 0.5556 ± 0.4181
UNIT1 (n=57): 0.0000 ± 0.0000

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

@plbenveniste
Copy link
Collaborator Author

plbenveniste commented Oct 17, 2024

I then evaluated the SCT models for segmenting spinal lesions on the external testing set (ms-basel-2018 and ms-basel-2020).

For sct_deepseg_lesion

I rand the following command:

python evaluation/test_sct_lesion_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0046 ± 0.0114
T1w (n=22): 0.0673 ± 0.2120
T2w (n=24): 0.3272 ± 0.3372

dice_scores_contrast

Here is the output for the other metrics
PPV score per contrast (mean ± std)
PD (n=31): 0.0613 ± 0.2076
T1w (n=22): 0.1136 ± 0.3060
T2w (n=24): 0.3993 ± 0.3877

F1 score per contrast (mean ± std)
PD (n=31): 0.0189 ± 0.0651
T1w (n=22): 0.0657 ± 0.2186
T2w (n=24): 0.4000 ± 0.3717

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0130 ± 0.0461
T1w (n=22): 0.2849 ± 0.4499

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

##For sct_deepseg mp2rage

I ran the following command:

python evaluation/test_sct_mp2rage_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0034 ± 0.0118
T1w (n=22): 0.0559 ± 0.2116
T2w (n=24): 0.2864 ± 0.4308

dice_scores_contrast

Here is the output for the other metrics
PPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423

F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

For sct_deepseg psir-stir

I ran the following command

python evaluation/test_sct_psir-stir_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/

Output:

Dice score per contrast (mean ± std)
PD (n=31): 0.0036 ± 0.0119
T1w (n=22): 0.2774 ± 0.4529
T2w (n=24): 0.2510 ± 0.3996

dice_scores_contrast

Here is the output for the other metrics
PPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2792 ± 0.4128

F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2812 ± 0.4154

Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558

f1_scores_contrast
ppv_scores_contrast
sensitivity_scores_contrast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant