Skip to content

This repository focuses on three specific levels of searching for objects within videos.

Notifications You must be signed in to change notification settings

khoi03/Object-Searching-in-Videos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Object-Searching-in-Videos

Introduction

In this task, I will focus on three different levels of searching for objects within videos:

  • Level 1: Find similar objects with no properties: A truck.

    Level 1

  • Level 2: Find object with color property: The red truck.

    Level 2

  • Level 3: Find this person

    Level 3

Eventually, I locate all frames and draw bounding boxes around the finding object X in the videos, and then export these frames as JPG files.

The structure of the output folders is as follows:
  • Video 1
    • Object X
      • Frame 15.jpg
      • Frame 32.jpg
      • Frame 120.jpg
  • Video 2
    • Object X
  • Video 3
    • Object X
      • Frame 215.jpg

Table of contents:

  1. How to run this repository

  2. Approach

    1. Level 1

    2. Level 2

    3. Level 3

  3. Results

    1. Level 1

    2. Level 2

    3. Level 3

1. How to run this repository

I recommend creating an anaconda environment:

conda create --name [environment-name] python=3.9

Then, install Python requirements:

pip install -r requirements.txt

Finally, to reproduce the results, you first have to download the provided example videos here. Then from the [environment-name] project root, run:

python demo.py

2. Approach

i. Level 1

At this level, I employ YOLOv8 model to detect all objects in the video, and subsequently I extract and draw bounding boxes exclusively around objects classified as truck.

ii. Level 2

Moving to the next stage, I commence by replicating the procedures of Level 1. utilizing YOLOv8 model to extract truck objects. Furthermore, in this task I employ the large segmentation YOLOv8 model(yolov8l-seg.pt) for all three levels. This choice is made not only to enhance the prediction accuracy due to its larger size but also it has the capability to generate masks for the detected objects, for example:

  • Identified Object:

    object

  • Object's mask:

    mask

In the event that the background contains elements with a similar color to the object, I further enhance accuracy by extracting the detected object based on its mask and applying a color detection algorithm as follows:

  • Extracted object:

    extract

To determine whether the pixel values of the object fall within the red color range, I check if the values for the blue and green channels are in the range (0, 50) and for the red channel are in the range (120, 255). Subsequently, I obtain the following red mask:

  • Red mask:

    cmask

Eventually, I can determine whether the detected truck is red by calculating the ratio of red pixels to the total object's pixels and setting a specific threshold for it.

iii. Level 3

At this final stage, I incorporate the use of the YOLOv8 model and Detector-Free Local Feature Matching with Transformers model (LoFTR for short), you can find their paper here.

  • The first task follows similar procedures to those of Level 1, but it focuses on human class.
  • Next step is to identify the similarities between the target person (input for this task) and the detected person. LoFTR identifies and extracts keypoints from the given image and the detected human. It then establishes mappings between pairs of keypoints and provides confidence scores for these pairs, you will have a deeper understanding through the following example:

LoFTR_example

  • Subsequently, I check if the number of confidence scores greater than 0.5 satisfies a particular threshold (I use a threshold of 65 in my code). Eventually, I employ YOLOv8 model to track ID of the detected human. If the model loses track of the person, the process will start over.

3. Results

In this section, I will provide an overview of the results from the provided examples, which you can access and download from here. Furthermore, please access the result frames for each video level via the following link.

i. Level 1

  • Video 1:

    • Frame 103: Frame 103
  • Video 2:

    • Frame 42: Frame 42

ii. Level 2

  • Video 1:

    • Frame 266: Frame 266
  • Video 2:

    • Frame 205: Frame 205

iii. Level 3

At the final level, you may want to see the full video result via this link.

  • Video 3:
    • Frame 1115: Frame 1115
    • Frame 1898: Frame 1898

About

This repository focuses on three specific levels of searching for objects within videos.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages