Project 2: Local Feature Matching

The objective of this project is to implement the Harris corner detector to generate feature points, which pass through a SIFT like-feature that generates a local descriptor corresponding to that feature point. An 16x16 image patch centered at the feature point is extracted, and is further subdivided into 16 smaller 4x4 image patches. Each of these 4x4 patch contributes to the local feature descriptor in the following way: The gradient directions at each of these points is calcuated and separated into 8 bins with each containing angles in increasing multiples of 45. These 8 bins would then serve as a 'descriptor' of that 4x4 patch to which it belongs. Since there are 16 of such patches, we have a total of 128 bins (i.e. we are returning a 128x1 vector). These descriptors will then be matched using the ratio test. The main files implemented for this project is listed below.

  1. get_interest_points.m : This is the implementation of the Harris corner detector. Feature points are returned if their response value is larger than a stimulated threshold.
  2. get_features.m : This is the implementation of a SIFT like feature descriptor, which accurately captures the key characteristics of a feature point that is ideally invariant to photometric and geometric differences. 3x3 or 5x5 patches are used to return the index correpsonding to the local maxima.
  3. get_features_w_tuning.m : This was an extra credit attempt to see the effect of increasing or decreasing bin sizes. It can be checked that accuracies dropped after using 6 bins (mutltiples of 60 deg) and 36 bins (mutltiples of 10 deg) respectively.
  4. match_features.m : This is the code used implement the ratio test on the feature descriptors.
  5. proj2.m : This is the code used to run the project.

Code for Harris corner detection

The code for our implementation is given below. The key feature is that instead of forming the 2x2 matrix corresponding to each (x,y) and calculating their eigenvalues, we instead compute the Harris response values as a matrix, with each entry corresponding to the response value for a given (x,y). This speeds up the computation tremendously. It can also be seen that a either a 3x3 or 5x5 image patch can be used to check for local maxima. The choice of which will affect the accuracies generated at the end. As a extra credit attempt, I've also tried to implement the Adaptive Non-Maximum Suppression (ANMS) after finding the local maxima. However, it should be noted that my implementation differs from the original implementation published in the paper. But nevertheless, using it can sometimes lead to higher accuracies, as will be shown. Also note that using ANMS is optional, as it was commented out in my submission.


  function [x, y, confidence, scale, orientation] = get_interest_points(image, feature_width)

  %assume image is grayscale
  s=feature_width/2;
  [Ix,Iy]=imgradientxy(image);

  sigma=2; k=0.04;
  g = fspecial('gaussian',max(1,fix(6*sigma)), sigma);
  Ix2 = conv2(Ix.^2, g, 'same');
  Iy2 = conv2(Iy.^2, g, 'same');
  Ixy = conv2(Ix.*Iy, g, 'same');

  R = (Ix2.*Iy2 - Ixy.^2) - k*(Ix2 + Iy2).^2;

  [n,m]=size(R);
  count =0;
  for j=1:n
      for i=1:m
          if (R(j,i)>0.005&&i>=s+1&&j>=s+1&&i<=m-s+1&&j<=n-s+1) %make sure not at boundary
              %local max using 5*5 neighborhood using either corner
              %scores or pixel values

              %window=R(j-2:j+2,i-2:i+2);
              %window=R(j-1:j+1,i-1:i+1);

              %window=image(j-2:j+2,i-2:i+2);
              window=image(j-1:j+1,i-1:i+1);

              w_max=max(max(window));

              %if (R(j,i)>=w_max)
              if (image(j,i)>=w_max)

                  count=count+1;
                  x(count,1)=i;
                  y(count,1)=j;
              end
          end
      end
  end

     %{
     %adaptive non max suppression
     %associates each (x,y) with a distance r
     %r is the distance to the 500th
     %sort by decreasing r
     %return the first 500 i.e. thresh=500 (may change)
     thresh=2000;
     t=size(x);
     temp=zeros(t(1),3);
     temp(:,1)=x; temp(:,2)=y;
     for i=1:t(1)
          dist=zeros(t(1),1);
          for j=1:t(1)
              dist(j)=sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
          end
          dist=sort(dist);
          temp(i,3)=dist(thresh);
     end
     [values,order]=sort(temp(:,3));
     temp=temp(order,:);
     x=temp(1:thresh,1);
     y=temp(1:thresh,2);
  %}
  end

Results

There are many parameters in our code that can be altered. For example, the Harris response value (R), the local maxima patch (3x3 or 5x5), the definition of local maxima (either maximum R or maximum value in the image matrix), and the no. of descriptors (n) returned by adaptive non-maximal supression (ANMS). Also, we can opt to use or not use ANMS. Some of the results are listed below.

  1. Mt Rushmore, 5x5, 98% accuracy, R=0.005, n=2000, R.
  2. Mt Rushmore, 5x5, 97% accuracy, R=0.005, non-suppressed, R.
  3. Notre Dame, 3x3, 84% accuracy, R=0.005, non-suppressed, Image.
  4. Notre Dame, 3x3, 88% accuracy, R=0.005, n=2000, Image.
  5. Notre Dame, 3x3, 89% accuracy, R=0.003, n=2000, Image.
  6. Notre Dame, 3x3, 64% accuracy, R=0.04, non-suppressed, R.
  7. Notre Dame, 3x3, 57% accuracy, R=0.04, n=500, R.
  8. Episcopal Gaudi, 3x3, 2% accuracy, R=0.04, non-suppressed, R.
  9. Episcopal Gaudi, 3x3, 5% accuracy, R=0.005, non-suppressed, R.
  10. Episcopal Gaudi, 5x5, 4% accuracy, R=0.005, non-suppressed, R.
  11. Episcopal Gaudi, 5x5, 2% accuracy, R=0.01, non-suppressed, R.
  12. Episcopal Gaudi, 3x3, 3% accuracy, R=0.01, non-suppressed, R.

The image matches for Notre Dame (64%, 84%, and 88%), Mt Rushmore (97% and 98%), and Episcopal Gaudi (4% and 5%) are shown below. The results can be found in the Results folder. From the list of results above, it seems that using R values as the basis of comparison for finding local maxima works better for Mt Rushmore and Episcopal Gaudi, while using the image values works better for Notre Dame.

Extra Credit

I attempted both parameter tuning and adaptive non-maximal suppression for the extra credit portion of this project. The effect of various parameters on the matching accuracy is listed in the table above. I also tried varying the number of bins for each 4x4 image patch, as described in "get_features_w_tuning.m".