Skip to content

deepmancer/clip-object-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 CLIP Zero-Shot Object Detection

PyTorch OpenAI Python Jupyter Notebook License

Detect objects in images without training!

Welcome to the CLIP Zero-Shot Object Detection project! This repository demonstrates how to perform zero-shot object detection by integrating OpenAI's CLIP (Contrastive Language-Image Pretraining) model with a Faster R-CNN for region proposal generation.


Source Code Website
github.com/deepmancer/clip-object-detection deepmancer.github.io/clip-object-detection

🎯 Quick Start

Set up and run the pipeline in three simple steps:

  1. Clone the Repository:

    git clone https://github.com/deepmancer/clip-object-detection.git
    cd clip-object-detection
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Run the Notebook:

    jupyter notebook clip_object_detection.ipynb

🤔 What is CLIP?

CLIP (Contrastive Language–Image Pretraining) is trained on 400 million image-text pairs. It embeds images and text into a shared space where the cosine similarity between embeddings reflects their semantic relationship.

CLIP Model Architecture CLIP Model Architecture - Paper

🔍 Methodology

Our approach combines CLIP and Faster R-CNN for zero-shot object detection:

  1. 📦 Region Proposal: Use Faster R-CNN to identify potential object locations.
  2. 🎯 CLIP Embeddings: Encode image regions and text descriptions into a shared embedding space.
  3. 🔍 Similarity Matching: Compute cosine similarity between text and image embeddings to identify matches.
  4. ✨ Results: Highlight detected objects with their confidence scores.

📊 Example Results

Input Image

Original Image

Region Proposals

Regions proposed by Faster R-CNN's RPN:

Candidate Regions

Detected Objects

Objects detected by CLIP based on textual queries:

Detected Objects


📦 Requirements

Ensure the following are installed:


📝 License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code.


⭐ Support the Project

If this project inspires or assists your work, please consider giving it a ⭐ on GitHub! Your support motivates us to continue improving and expanding this repository.