Object detection is a computer vision task that involves both localizing one or more objects within an image and classifies each object in the image. It is a challenging computer vision task that requires both successful object localization in order to locate and draw a bounding box around each object in an image and object classification to predict the correct class of object that was localized.
The You Only Look Once (YOLO), a family of models is a series of end-to-end deep learning models designed for fast object detection. The approach involves a single deep convolutional neural network that splits the input into a grid of cells and each cell directly predicts a bounding box and object classification. The result is a large number of candidate bounding boxes that are consolidated into a final prediction by a post-processing step.