Object-Detection-with-Voice-Feedback-YOLO-v8-and-gTTS

Object Detection is a field of Computer Vision that detects instances of semantic objects in images/videos (by creating bounding boxes around them in our case). We can then convert the annotated text into voice responses and give the basic positions of the objects in the person/camera’s view.

A Very High-Level Overview

Training Data: The model is trained with the Common Objects In Context (COCO) dataset. You can explore the images that they labeled in the link, it’s pretty cool.
Model: The model here is the You Only Look Once (YOLO) algorithm that runs through a variation of an extremely complex Convolutional Neural Network architecture called the Darknet. We are using the more advanced YOLO v8 model. The Python cv2 package has a method to set up Darknet from our configurations in the yolov8.cfg file. COCO has already been trained on YOLO v8 by others, so we will be using a pre-trained model and have already obtained the weights stored in a 200+ MB file.
Input Data: We will be using static images and feed them to this trained model.
API: The class prediction of the objects detected in every image will be a string, e.g., “cat”. We will also obtain the coordinates of the objects in the image and append the position “top”/“mid”/“bottom” & “left”/“center”/“right” to the class prediction “cat”. We can then send the text description to the Google Text-to-Speech API using the gTTS package.
Output: We will be getting voice feedbacks in the form, e.g., “bottom left cat” — meaning a cat was detected on the bottom-left of my camera view using the Google Text-to-Speech API via the gTTS package by providing a text description of the object.

Voice Feedback

We can use bx & by relative to W & H to determine the position of the objects detected and send it as a text string to gTTS.

Note:

You need to download the yolo pretrained weights to get started with the code.

Usage

To use this repository, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/Object-Detection-with-Voice-Feedback-YOLO-v8-and-gTTS.git
cd Object-Detection-with-Voice-Feedback-YOLO-v8-and-gTTS

Download the YOLO v8 weights and place them in the directory:

wget https://pjreddie.com/media/files/yolov8.weights -P path/to/weights/directory

Install the required packages:
```
pip3 install -r requirements.txt
```

Run the object detection script:

python3 script.py -i path/to/image.jpg -y path/to/yolo_directory

Ensure that your yolo_directory contains the yolov8.cfg, coco.names, and yolov8.weights files. Modify the script as necessary to fit your specific setup and requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
yolo		yolo
.gitignore		.gitignore
README.md		README.md
Visual_Speech_Recognition.pdf		Visual_Speech_Recognition.pdf
object_detection.mp3		object_detection.mp3
overview.png		overview.png
requirements.txt		requirements.txt
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object-Detection-with-Voice-Feedback-YOLO-v8-and-gTTS

A Very High-Level Overview

Voice Feedback

Note:

Usage

About

Releases

Packages

Languages

HiBorn4/Object-Detection-with-Voice-Feedback

Folders and files

Latest commit

History

Repository files navigation

Object-Detection-with-Voice-Feedback-YOLO-v8-and-gTTS

A Very High-Level Overview

Voice Feedback

Note:

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages