Project 5 / Face Detection with a Sliding Window

This project encompassed the linear classification of images' Histogram of Gradients (HoG) features in an attempt to detect the presence of faces within images. HoG features consist of the aggregation of gradient orientations within an image which gives a rough outline of the objects present in an image. In order to detect as many faces as possible, the sliding window technique was used to iterate through many locations within the image, and this process was repeated with many scales of the image. The steps of this process were:

  1. Producing HoG features for various pictures of faces as well as patches from many non-face images
  2. Training a linear classifier for the HoG features with a support vector machine
  3. Processing patches of test images at various scales with a sliding window detector and running them through the classifier to decide whether or not to label them as faces

Acquiring HoG Features

The first step of the process was producing HoG features from positive face images and patches from negative non-face images. In order to improve the performance of the pipeline, I sampled the negative images at random scales and then took out random patches to get HoG features from. These HoG features were then used to test a linear SVM classifier. Examples of the images used can be seen below:

Positive examples

Negative Examples

SVM Classification

The SVM that was trained with the positive and negative HoG features resulted in the formation of a HoG template for a face as seen below. The accuracy of the SVM was usually in the range of 0.998 to 1 with a true positive rate of about 0.401 and a true negative rate of about 0.598. Upon close observation, the template can be seen as resemblng a face:

Sliding Window Detection

Finally, the SVM that was trained was run on patches extracted from various scales of a set of test images. For each scale of the images, I iterated through all possible locations where a patch of the size of my learned template could reside. I fed the HoG feature from that patch into my classifier. If the confidence yielded was above a particular threshold, I calculated the coordinates of a bounding box that corresponded to that part of the image and added that to a list of potential bounding boxes. The number of bounding boxes with potential faces was then reduced using the process of non-maximum supression.

Various parameters had to be fine-tuned to get a good result:

Here are the results of my best runs:

Example output results: