This repository contains an implementation of the UNet architecture for image segmentation tasks, specifically targeting binary segmentation. The code is designed to train a UNet model, evaluate its performance, and make predictions on new images.
- Overview
- UNet Architecture
- Dataset
- Repository Structure
- Setup and Installation
- Training the Model
- Inference and Making Predictions
- Results
- References
UNet is a convolutional neural network architecture primarily used for biomedical image segmentation. It was first introduced by Olaf Ronneberger et al. in their 2015 paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". The key feature of UNet is its U-shaped architecture, consisting of a contracting path (encoder) and an expansive path (decoder), which makes it highly effective for precise localization and segmentation tasks.
UNet is widely used in various fields, including:
- Medical Imaging: Segmentation of organs, tumors, and other structures in CT, MRI, and ultrasound images.
- Satellite Image Analysis: Land cover classification, road detection, and urban planning.
- Autonomous Vehicles: Identifying objects and boundaries on the road for navigation.
- Agriculture: Crop and soil segmentation from aerial or satellite images.
Below is a visual representation of the UNet architecture:
The architecture consists of:
- Contracting Path (Encoder): A sequence of convolutional layers followed by max pooling to downsample the input image, capturing the context.
- Bottleneck: The bottom of the U, where the feature maps are the smallest in spatial dimensions but have the deepest representation.
- Expansive Path (Decoder): A sequence of transposed convolutions that upsample the feature maps and concatenate them with corresponding feature maps from the contracting path, allowing precise localization.
This project utilizes the Carvana Image Masking Challenge dataset, which is hosted on Kaggle. The dataset consists of high-resolution images of cars, along with corresponding binary masks that outline the car's silhouette.
- Images: The dataset contains 5,000 images of cars taken from various angles.
- Masks: Each image has an associated binary mask that highlights the car in the image. The masks are used as ground truth for training the segmentation model.
- Challenge: The goal is to accurately predict the car mask for each image, essentially segmenting the car from the background.
- Sign up or log in to Kaggle.
- Visit the Carvana Image Masking Challenge Dataset page.
- Download the dataset and extract it into the
data/
directory of this repository, maintaining the structure:
data/
├── train_images/
├── train_masks/
├── val_images/
└── val_masks/
Here's a brief overview of the files in this repository:
models.py
: Contains the implementation of the UNet architecture.train.py
: Script to train the UNet model on a given dataset.utils.py
: Utility functions for saving/loading model checkpoints, calculating accuracy, and saving prediction images.config.py
: Configuration file containing hyperparameters, file paths, and other settings.inference.py
(to be created): Script for running inference on new images using a trained UNet model.
-
Clone the Repository:
git clone https://github.com/matin-ghorbani/Carvana-Segmentation-UNet cd Carvana-Segmentation-UNet
-
Install the Required Packages: Ensure you have Python 3.8+ and PyTorch installed. Install the dependencies using pip:
pip install -r requirements.txt
To train the model, run the train.py script. Make sure the dataset is correctly placed in the data/ directory as mentioned above.
python train.py
During training, the script will:
- Load the training and validation datasets.
- Train the UNet model for the specified number of epochs.
- Save the trained model checkpoints.
- Evaluate the model's performance on the validation set.
- Save sample predictions as images.
You can adjust the training parameters (e.g., learning rate, batch size, number of epochs) in the config.py
file.
To run inference on a new image and overlay the prediction on the original image, use the inference.py
script. Run the inference script:
python inference.py --model path/to/checkpoint.pth.tar --img path/to/your/image.jpg --save
You can download my weight from here
I got these results after only 3 epochs
Test Image 1 | Test Image 2 | Test Image 3 |
---|---|---|
Prediction 1 | Prediction 2 | Prediction 3 |
---|---|---|
- Training loss: 0.0772
- Testing accuracy: 0.9756
- Testing dice score: 0.9455
- UNet Paper: U-Net: Convolutional Networks for Biomedical Image Segmentation
- Carvana Image Masking Challenge: Kaggle Competition