Face Detection Project

This project encompassed the linear classification of images' Histogram of Gradients (HoG) features in an attempt to detect the presence of faces within images. HoG features consist of the aggregation of gradient orientations within an image which gives a rough outline of the objects present in an image. In order to detect as many faces as possible, the sliding window technique was used to iterate through many locations within the image, and this process was repeated with many scales of the image. The steps of this process were:

Acquiring HoG Features

The first step of the process was producing HoG features from positive face images and patches from negative non-face images. In order to improve the performance of the pipeline, I sampled the negative images at random scales and then took out random patches to get HoG features from. These HoG features were then used to test a linear SVM classifier. Examples of the images used can be seen below:

Positive examples

Negative Examples

SVM Classification

The SVM that was trained with the positive and negative HoG features resulted in the formation of a HoG template for a face as seen below. The accuracy of the SVM was usually in the range of 0.998 to 1 with a true positive rate of about 0.401 and a true negative rate of about 0.598. Upon close observation, the template can be seen as resemblng a face:

Sliding Window Detection

Finally, the SVM that was trained was run on patches extracted from various scales of a set of test images. For each scale of the images, I iterated through all possible locations where a patch of the size of my learned template could reside. I fed the HoG feature from that patch into my classifier. If the confidence yielded was above a particular threshold, I calculated the coordinates of a bounding box that corresponded to that part of the image and added that to a list of potential bounding boxes. The number of bounding boxes with potential faces was then reduced using the process of non-maximum supression.

Various parameters had to be fine-tuned to get a good result:

First off, running the detector with only one scale resulted in a mere precision of rougly 0.329 with an SVM threshold of 0.2. When I used 6 scales which were produced by repeatedly downscaling the images by a factor of 0.7, I got a higher precision of 0.809.
I tried to increase the SVM threshold to 0.7 in order to yield less incorrect bounding boxes, but this hindered rather than helped. I got a reduced average precision of 0.768.
I then tried scaling at a less drastic rate of 0.9 and sampled 17 scales of images. This gave me higher precisions such as 0.857 and 0.834, which varied depending on the setting of the SVM confidence threshold to 0.8 or 0.65. The best results I got were from using the 17 scales and a confidence threshold of 0.5. The highest precisions I got were 0.876 and 0.861.

Here are the results of my best runs:

Example output results:

Ayan Das

Project 5 / Face Detection with a Sliding Window

Acquiring HoG Features

Positive examples

Negative Examples

SVM Classification

Sliding Window Detection