Diffusion models have significantly impacted the realm of image generation. In a bid to reduce the technical complexity, we aim to lower the entry barrier for the medical community. To achieve this, we have introduced mediffusion, a user-friendly diffusion package that can be effortlessly tailored to address medical problems with less than 20 lines of code. We have utilized various codebases, including guided diffusion and LDM, enhancing their robustness for medical use cases. We plan to update this package regularly. Embracing the spirit of open science, we invite you to consider sharing a demo notebook of your work should you choose to utilize this package.
Happy Coding!
If you haven't installed Conda yet, you can download it from here. After installing, create a new Conda environment by running:
conda create --name mediffusion python=3.10
Activate the environment:
conda activate mediffusion
Install PyTorch specifically for CUDA 11.8 by running:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
You can install the latest version from github using:
pip install mediffusion
This will install all the necessary packages.
Before starting the training, it is recommended that you set up some global constants and environment variables:
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
os.environ['WANDB_API_KEY'] = "WANDB-API-KEY"
TOTAL_IMAGE_SEEN = 40e6
BATCH_SIZE = 36
NUM_DEVICES = 2 # number of devices in CUDA_VISIBLE_DEVICES
TRAIN_ITERATIONS = int(TOTAL_IMAGE_SEEN / (BATCH_SIZE * NUM_DEVICES))
To prepare the data, you need to create a dataset where each element is a dictionary. The dictionary should have the key "img" and may also contain additional keys like "cls" and "concat" depending on the type of condition. One way to do this is by using MONAI. Below is a sample code snippet:
import monai as mn
train_data_dicts = [
{"img": "./image1.dcm", "cls": 2},
{"img": "./image2.dcm", "cls": 0}
]
valid_data_dicts = [
{"img": "./image9.dcm", "cls": 1}
]
transforms = mn.transforms.Compose([
mn.transforms.LoadImageD(keys="img"),
mn.transforms.SelectItemsD(keys=["img","cls"]),
mn.transforms.ScaleIntensityD(keys=["img"], minv=-1, maxv=1),
mn.transforms.ToTensorD(keys=["img","cls"], dtype=torch.float, track_meta=False),
])
train_ds = mn.data.Dataset(data=train_data_dicts, transform=transforms)
valid_ds = mn.data.Dataset(data=valid_data_dicts, transform=transforms)
train_sampler = torch.utils.data.RandomSampler(train_ds, replacement=True, num_samples=TOTAL_IMAGE_SEEN)
At the end of this step, you should have train_ds
, val_ds
and train_sampler
.
Below is a table that provides descriptions for each element in the configuration file:
Section | Field | Description |
---|---|---|
diffusion | timesteps | The number of timesteps in the diffusion process |
schedule_name | The name of the schedule (e.g., "cosine") | |
enforce_zero_terminal_snr | Whether to enforce zero terminal SNR (True/False) | |
schedule_params | Parameters related to the diffusion schedule | |
-- beta_start | Starting value for beta in the schedule | |
-- beta_end | Ending value for beta in the schedule | |
-- cosine_s | Parameter for cosine schedule | |
timestep_respacing | Can be a list of respacings. For example, with 200 steps, [10,20] means in the first 100, get 10 samples and in the next 100, get 20 samples. | |
mean_type | Type of mean model (e.g., "VELOCITY") | |
var_type | Type of variance model (e.g., "LEARNED_RANGE") | |
loss_type | The type of loss to use (e.g., "MSE") | |
optimizer | lr | Learning rate |
type | The type of optimizer to use | |
validation | classifier_cond_scale | Classifier guidance scale for validation logging. |
protocol | Inference protocol for logging validation results | |
log_original | Whether to log the original validation data (True/False) | |
log_concat | Whether to log the concatenated images (True/False) | |
log_cls_indices | Whether to log the entire cls vector (default value of -1), or specefic indices from the cls vector (user should provide a list of desired cls indices) | |
model | input_size | The input size of the model. Can be an integer for square and cube images or a list of integers for specific axes, like [64, 64, 32] |
dims | Number of dimensions, 2 or 3 for 2D and 3D images | |
attention_resolutions | List of resolutions for attention layers | |
channel_mult | List of multipliers for each layer's channels | |
dropout | Dropout rate | |
in_channels | Number of input channels (image channels + concat channels) | |
out_channels | Number of output channels (image channels or image channels * 2 if learning the variance) | |
model_channels | Number of convolution channels in the model | |
num_head_channels | Number of attention head channels | |
num_heads | Number of attention heads | |
num_heads_upsample | Number of attention head after upsampling | |
num_res_blocks | List of the number of residual blocks for each layer | |
resblock_updown | Whether to use residual blocks for down/up sampling (True/False) | |
use_checkpoint | Whether to use checkpointing (True/False) | |
use_new_attention_order | Whether to use the new attention ordering (True/False) | |
use_scale_shift_norm | Whether to use scale-shift normalization (True/False) | |
scale_skip_connection | Whether to scaleskip connections (True/False) | |
num_classes | Number of classes for conditioning | |
concat_channels | Number of concatenatong channels for conditioning (for super-resolution or inpainting) | |
guidance_drop_prob | Drop probability for the classifier free guidance scale training |
For sample configurations, please checkout the sample_configs
directory.
Note: If a field is left out of the config file, the default value is infered based on this file: mediffusion/default_config/default.yaml
.
You can instantiate the model using the configuration file and dataset as follows:
from mediffusion import DiffusionModule
model = DiffusionModule(
"./config.yaml",
train_ds=train_ds,
val_ds=valid_ds,
dl_workers=2,
train_sampler=train_sampler,
batch_size=32, # train batch size
val_batch_size=16 # validation batch size (recommended size is half of batch_size)
)
You can set up the trainer using the Trainer
class:
from mediffusion import Trainer
trainer = Trainer(
max_steps=TRAIN_ITERATIONS,
val_check_interval=5000,
root_directory="./outputs", # where to save the weights and logs
precision="16-mixed", # mixed precision training
devices=-1, # use all the devices in CUDA_VISIBLE_DEVICES
nodes=1,
wandb_project="Your_Project_Name",
logger_instance="Your_Logger_Instance",
)
Finally, to train your model, you simply call:
trainer.fit(model)
First, import the DiffusionModule
class and load the pre-trained model checkpoint. The model is then moved to the CUDA device and set to inference mode. Additionally, you may choose to enable half-precision for better performance:
from mediffusion import DiffusionModule
model = DiffusionModule("./config.yaml")
model.load_ckpt("./outputs/pl/last.ckpt", ema=True)
model.cuda().half()
model.eval()
Prepare the noise and model keyword arguments. Here, "cls"
specifies the class condition and is set to 0:
import torch
noise = torch.randn(1, 1, 256, 256)
model_kwargs = {"cls": torch.tensor([0]).cuda().half()}
Note: You can use other keys like concat
and/or cls_embed
. To find out more, look at the tutorials
directory.
To make a prediction, use the predict
method from the DiffusionModule
class:
img = model.predict(
noise,
model_kwargs=model_kwargs,
classifier_cond_scale=4,
inference_protocol="DDIM100"
)
noise
: The input noise tensormodel_kwargs
: A dictionary containing additional model configurations (e.g., class conditions)classifier_cond_scale
: The scale used for the classifier free guidance condition during inferenceinference_protocol
: The inference protocol to be used (e.g.,"DDIM100"
)
The img
is the generated output based on the model's inference (C:H:W(:D)
). To save the image, you need to transpose it first, due to the different axis conventions.
Note: The model currently supports the following solvers: DDPM
,DDIM
,IDDIM
(for inverse diffusion), and PLMS
. As an example, "PLMS100"
means using the PLMS
solver for 100
steps.
For more hands-on tutorials on how to effectively use this package, please check the tutorials
folder in the GitHub repository. These tutorials provide step-by-step instructions, Colab notebooks, and explanations to help you get started with the software.
File Name | Description | Notebook Link |
---|---|---|
01_2d_ddpm | Getting started with training a simple 2D class-conditioned DDPM. | π |
02_2d_inpainting | Image inpainting with 2D diffusion model (repaint method) | π |
The following features and improvements are currently on our development roadmap:
- Cross-attention
- DPM-Solver
- VAE for LDM
We are actively working on these features and they will be available in future releases.
If you encounter any issues while using this package, we encourage you to open an issue in the GitHub repository. Your feedback helps us to improve the software and resolve any bugs or limitations.
Contributions to the codebase are always welcome. If you have a feature request, bugfix, or any other contribution, feel free to submit a pull request.
If you're interested in actively participating in the development of this package, please send us a Direct Message (DM). We're always open to collaboration and would be delighted to have you on board.
If you find this work useful, please consider citing the parent project:
@article{KHOSRAVI2023107832,
title = {Few-shot biomedical image segmentation using diffusion models: Beyond image generation},
journal = {Computer Methods and Programs in Biomedicine},
volume = {242},
pages = {107832},
year = {2023},
issn = {0169-2607},
doi = {https://doi.org/10.1016/j.cmpb.2023.107832},
url = {https://www.sciencedirect.com/science/article/pii/S0169260723004984},
author = {Bardia Khosravi and Pouria Rouzrokh and John P. Mickley and Shahriar Faghani and Kellen Mulford and Linjun Yang and A. Noelle Larson and Benjamin M. Howe and Bradley J. Erickson and Michael J. Taunton and Cody C. Wyles},
}