Yolo is an end-to-end, real-time object detection system that uses a single convolutional neural network to predict bounding boxes and class probabilities directly from full images. It uses a deeper Darknet-53 backbone network and multi-scale predictions to achieve state-of-the-art accuracy while running faster than other algorithms. Yolo is trained on a merged ImageNet and COCO dataset and predicts bounding boxes using predefined anchor boxes and associated class probabilities at three different scales to localize and classify objects in images with just one pass through the network.