The objective of this project is to implement the Harris corner detector to generate feature points, which pass through a SIFT like-feature that generates a local descriptor corresponding to that feature point. An 16x16 image patch centered at the feature point is extracted, and is further subdivided into 16 smaller 4x4 image patches. Each of these 4x4 patch contributes to the local feature descriptor in the following way: The gradient directions at each of these points is calcuated and separated into 8 bins with each containing angles in increasing multiples of 45. These 8 bins would then serve as a 'descriptor' of that 4x4 patch to which it belongs. Since there are 16 of such patches, we have a total of 128 bins (i.e. we are returning a 128x1 vector). These descriptors will then be matched using the ratio test. The main files implemented for this project is listed below.
function [x, y, confidence, scale, orientation] = get_interest_points(image, feature_width)
%assume image is grayscale
s=feature_width/2;
[Ix,Iy]=imgradientxy(image);
sigma=2; k=0.04;
g = fspecial('gaussian',max(1,fix(6*sigma)), sigma);
Ix2 = conv2(Ix.^2, g, 'same');
Iy2 = conv2(Iy.^2, g, 'same');
Ixy = conv2(Ix.*Iy, g, 'same');
R = (Ix2.*Iy2 - Ixy.^2) - k*(Ix2 + Iy2).^2;
[n,m]=size(R);
count =0;
for j=1:n
for i=1:m
if (R(j,i)>0.005&&i>=s+1&&j>=s+1&&i<=m-s+1&&j<=n-s+1) %make sure not at boundary
%local max using 5*5 neighborhood using either corner
%scores or pixel values
%window=R(j-2:j+2,i-2:i+2);
%window=R(j-1:j+1,i-1:i+1);
%window=image(j-2:j+2,i-2:i+2);
window=image(j-1:j+1,i-1:i+1);
w_max=max(max(window));
%if (R(j,i)>=w_max)
if (image(j,i)>=w_max)
count=count+1;
x(count,1)=i;
y(count,1)=j;
end
end
end
end
%{
%adaptive non max suppression
%associates each (x,y) with a distance r
%r is the distance to the 500th
%sort by decreasing r
%return the first 500 i.e. thresh=500 (may change)
thresh=2000;
t=size(x);
temp=zeros(t(1),3);
temp(:,1)=x; temp(:,2)=y;
for i=1:t(1)
dist=zeros(t(1),1);
for j=1:t(1)
dist(j)=sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
end
dist=sort(dist);
temp(i,3)=dist(thresh);
end
[values,order]=sort(temp(:,3));
temp=temp(order,:);
x=temp(1:thresh,1);
y=temp(1:thresh,2);
%}
end
There are many parameters in our code that can be altered. For example, the Harris response value (R), the local maxima patch (3x3 or 5x5), the definition of local maxima (either maximum R or maximum value in the image matrix), and the no. of descriptors (n) returned by adaptive non-maximal supression (ANMS). Also, we can opt to use or not use ANMS. Some of the results are listed below.
The image matches for Notre Dame (64%, 84%, and 88%), Mt Rushmore (97% and 98%), and Episcopal Gaudi (4% and 5%) are shown below. The results can be found in the Results folder. From the list of results above, it seems that using R values as the basis of comparison for finding local maxima works better for Mt Rushmore and Episcopal Gaudi, while using the image values works better for Notre Dame.
|
|
|
|
|
|
I attempted both parameter tuning and adaptive non-maximal suppression for the extra credit portion of this project. The effect of various parameters on the matching accuracy is listed in the table above. I also tried varying the number of bins for each 4x4 image patch, as described in "get_features_w_tuning.m".