Object Detection is one of the most important parts of computer vision. In this course project we studied the YOLO (You Only Look Once) object detection framework, trained a scaled down version of the network with 8 convolutional+maxpool layers followed by 1 fully connected layer. We also improved YOLO’s reported performance in cluttered scenes by increasing its grid resolution, which helped detect smaller objects with greater accuracy.
We also concluded that having more than 2 bounding boxes per cell was detrimental to performance, probably because it led to more situations where the wrong box was being selected. A YOLO network pre-trained on ImageNet was also tested on the KITTI object detection benchmark.

A presentation summarizing the work can be accessed here: [Slides]
Report available here: [Report]

Mona Lisa with a cat Planes in the sky Horse and jockey
Some Results Obtained