|
The following writeup is organized as follows:
I. Tiny Images with K-nearest neighbors
II. Tiny Images with Linear SVM Classifier
III. Bag of Sift features with K-nearest neighbors
IV. Bag of Sift features with Linear SVM classifier
V. Bag of Spatial Sift features with K-nearest neighbors
VI. Bag of Spatial Sift features with Linear SVM classifier
VII. Bag of Soft Spatial Sift features with K-nearest neighbors
VIII. Bag of Soft Spatial Sift features with Linear SVM classifier
IX. GIST features with K-nearest neighbors
X. GIST features with Linear SVM classifier
XI. GIST + Bag of Soft Spatial Sift features with K-nearest neighbors
XII. GIST+Bag of Soft Spatial Sift features with Linear SVM classifier
I have used the following algorithm for scene recognition using tiny image representation as features and k-nearest neighbors for classification.
1. Resize each image to a fixed resolution (used 16x16 here) and construct a feature vector by using just the intensity values at each pixel. Obtain such features for each image in the training dataset.
2. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric. (used k=1)
I have calculated the accuracy and the confusion matrix using 16x16 as the resolution and value of the parameter K=1 in K-nearest neighbors.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the resolution size from 8x8, 16x16, 32x32, 64x64 and 128x128 and values of K from 1 to 30.
Observations:
1. As I increased the resolution size, the time taken for obtaining features has increased significantly, but there was not much improvement in accuracy (it decreased the accuracy as you can see from the plot).
2. As I increased the value of K, there was no significant improvement in performance either but the time taken to compute the predicted label for each image has increased.
The image below shows the results of accuracy versus the values of K varying from 1 to 30 and the resolution size from 8x8, 16x16, 32x32, 64x64 and 128x128.
|
I have used the following algorithm for scene recognition using tiny image representation as features and Linear SVM model for classification.
1. Resize each image to a fixed resolution (used 16x16 here) and construct a feature vector by using just the intensity values at each pixel. Obtain such features for each image in the training dataset.
2. Using the training data, train a 1-vs-all linearSVMs for each category which will be later used while testing.
3. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence. ( used lambda = 10 for better performance)
I have calculated the accuracy and the confusion matrix using 16x16 as the resolution and value of the parameter lambda=10 since I found better performance using this value in SVMs.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the resolution size from 8x8, 16x16, 32x32, 64x64 and 128x128 and values of lambda varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10.
Observations:
1. As I increased the resolution size, the time taken for obtaining features has increased significantly, I found certain cases where the performance has improved as you can see from the plot below.
2. As I increased the value of lambda value, the time taken has reduced to compute the predicted label for each image and the performance improved in a few cases and dropped in a few. There was no clear pattern observed.
The image below shows the results of accuracy versus the values of lambda varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10 and the resolution size from 8x8, 16x16, 32x32, 64x64 and 128x128.
|
I have used the following algorithm for scene recognition using Bag of Sift features and k-nearest neighbors for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 8 and size equal to 16, fast). Using all the SIFT features (used all samples here) for each image and construct a vocabulary of vocab_size (used vocab_size = 500) using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16,fast) and then construct Histogram of SIFT features by maintaining a count into which cluster center each SIFT feature in the image is closest to. Construct this histogram by using a sample of features obtained from that particular image(here sampled about 500 sift features and used them in constructing the histogram of that particular image).
3. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
4. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric. (used K=8)
I have calculated the accuracy and the confusion matrix using the bag of sift features model and the values of the paramaters used: step equal to 4, size equal to 16,'fast' for get_bags_of_sifts using a vocabulary of size 500 and sampling about 500 feature vectors for the construction of the histogram. The value of the parameter K=8 in K-nearest neighbors.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast'. I used all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
Observations:
1. As I increased the vocabulary size, although the time taken is increasing slightly, the accuracy is also improving as you can see from the plot below.
2. As I increased the value of K in k-nearest neighbors, the time taken has reduced to compute the predicted label for each image and there was not much of improvement in performance although it did help in a few cases.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
|
I have used the following algorithm for scene recognition using Bag of Sift features and k-nearest neighbors for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 8 and size equal to 16,fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size (used vocab_size = 200) using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16, fast) and then construct Histogram of SIFT features by maintaining a count into which cluster center each SIFT feature in the image is closest to. Construct this histogram by using a sample of features obtained from that particular image(here sampled about 500 sift features and used them in constructing the histogram of that particular image).
3. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
4. Using the features of the training data, train a 1-vs-all linearSVMs for each category which will be later used while testing. (used lambda=1)
5. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence.
I have calculated the accuracy and the confusion matrix using the bag of sift features model and the values of the paramaters used: step equal to 4, size equal to 16, fast for get_bags_of_sifts using a vocabulary of size 500 and sampling about 500 features for each image for construction of histogram. The value of the parameter lambda = 1 in Linear SVM.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast'. I used all the features from the training and testing images to construct their histogram which is used as features values of lambda in SVM varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10.
Observations:
1. As I increased the vocabulary size, although the time taken is increasing slightly, the accuracy is also improving as you can see from the plot below.
2. As I increased the value of lambda value, the time taken has reduced to compute the predicted label for each image and the performance improved in a few cases and dropped in a few. There was no clear pattern observed.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of of lambda varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10 in linear SVM.
|
I have used the following algorithm for scene recognition using Bag of Spatial Sift features and k-nearest neighbors for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 4 and size equal to 16, fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16,fast) and then construct Histogram of SIFT features by maintaining a count into which cluster center each SIFT feature in the image is closest to. Construct this histogram by using a sample of features obtained from that particular image(here sampled all the feature vectors from each image for constructing the histogram of that particular image for good performance).
3. level-1: Now repeat the step2 by dividing the image into 2x2 and extract histograms for each sub-part of the image, so we will obtain 4 additional histograms now.
4. level-2: Now again repeat step2 by dividing the original image into 4x4 and extract hostograms for each of these sub-parts. We will have 16 histograms at this level.
5. two ways to implement this, level-1: use level-0 and level-1 features: concatenate them and use them as features
(or) level-2: use level-0, level-1 and level-2 features: concatenate them and then use them as features.
6. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
7. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric.
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast and sampled all features,used level=2 for get_bags_of_sifts using a vocabulary of size 500 and value of the parameter K=25 in K-nearest neighbors. Following are the results using this parameters.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast',level=2. I used all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
Observations:
1. As I increased the vocabulary size, the time taken is increasing slightly but here there was no clear improvement of performance as you can see from the plot below.
2. As I increased the value of K in k-nearest neighbors, the time taken has reduced to compute the predicted label for each image and there was not much of improvement in performance although it did help in a few cases.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
|
I have used the following algorithm for scene recognition using Bag of Spatial Sift and svm for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 8 and size equal to 16,fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16, fast) and then construct Histogram of SIFT features by maintaining a count into which cluster center each SIFT feature in the image is closest to. Construct this histogram by using a sample of features obtained from that particular image(here used all of the feature vectors from each image for constructing the histogram of that particular image for better performance).
3. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
4. Using the features of the training data, train a 1-vs-all linearSVMs for each category which will be later used while testing.
5. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence.
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast, sampled all the features and level=2, for get_spatial_bags_of_sifts using a vocabulary of size 1000 and value of the parameter lambda = 0.1 in Linear SVM. Following are the results using this parameters.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast',level=2. I used all the features from the training and testing images to construct their histogram which is used as features values of lambda in SVM varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10.
Observations:
1. As I increased the vocabulary size, although the time taken is increasing slightly, the accuracy is also improving as you can see from the plot below.
2. As I increased the value of lambda value, the time taken has reduced to compute the predicted label for each image and the performance improved in a few cases and dropped in a few. There was no clear pattern observed.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of of lambda varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10 in linear SVM.
|
I have used the following algorithm for scene recognition using Bag of Soft Spatial Sift features and k-nearest neighbors for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 4 and size equal to 16, fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16,fast) and then construct Histogram of SIFT features by voting a weight inversely proportional to the distance from each cluster center to that particular cluster for each SIFT feature. Construct this histogram by using a sample of features obtained from that particular image(here sampled all the feature vectors from each image for constructing the histogram of that particular image for good performance).
3. level-1: Now repeat the step2 by dividing the image into 2x2 and extract histograms for each sub-part of the image, so we will obtain 4 additional histograms now.
4. level-2: Now again repeat step2 by dividing the original image into 4x4 and extract hostograms for each of these sub-parts. We will have 16 histograms at this level.
5. two ways to implement this: use level-0 and level-1 features: concatenate them and use them as features
(or) use level-0, level-1 and level-2 features: concatenate them and then use them as features.
6. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
7. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric.
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_soft_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast and sampled all features, used vote=10000/distance, used level=2 for get_spatial_bags_of_sifts_soft using a vocabulary of size 1000 and value of the parameter K=8 in K-nearest neighbors. Following are the results using this parameters.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast',level=2,vote=10000/distance. I used all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
Observations:
1. As I increased the vocabulary size, the time taken is increasing slightly but here there was no clear improvement of performance as you can see from the plot below.
2. As I increased the value of K in k-nearest neighbors, the time taken has reduced to compute the predicted label for each image and there was not much of improvement in performance although it did help in a few cases.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of K varying from 1 to 30 in k-nearest neighbors.
|
I have used the following algorithm for scene recognition using Bag of Soft Spatial Sift and SVM for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 8 and size equal to 16,fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16, fast) and then construct Histogram of SIFT features by voting a weight inversely proportional to the distance from each cluster center to that particular cluster for each SIFT feature. Construct this histogram by using a sample of features obtained from that particular image(here sampled all the feature vectors from each image for constructing the histogram of that particular image for good performance).
3. Obtain such histogram for each training and testing image and now these are to used as Features for the next steps.
4. Using the features of the training data, train a 1-vs-all linearSVMs for each category which will be later used while testing.
5. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence.
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_soft_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast, sampled all the features and level=2, vote=10000/distance for get_spatial_bags_of_sifts_soft using a vocabulary of size 1000 and value of the parameter lambda = 0.00001 in Linear SVM. Following are the results using this parameters.
Following is an analysis of the various parameters involved in this pipeline. I have tried varying the vocabulary size from 50,100, 200, 500 and 1000. I have always used the same paramters for the sift features: step-4,size-16,'fast',level=2,vote=10000/distance. I used all the features from the training and testing images to construct their histogram which is used as features values of lambda in SVM varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10.
Observations:
1. As I increased the vocabulary size, although the time taken is increasing slightly, the accuracy is also improving as you can see from the plot below.
2. As I increased the value of lambda value, the time taken has reduced to compute the predicted label for each image and the performance improved in a few cases and dropped in a few. There was no clear pattern observed.
The image below shows the results of accuracy versus the vocabulary size from 50,100, 200, 500 and 1000 and using all the features from the training and testing images to construct their histogram which is used as features and the value of of lambda varying from 0.00001, 0.0001, 0.001, 0.01, 0.1, 1 and 10 in linear SVM.
|
I have used the following algorithm for scene recognition using GIST features and k-nearest neighbors for classification.
1. Extract GIST for each image and use them directly as features.
2. Obtain such features for each training and testing image. Now these are to used as Features for the next steps.
3. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric. (used K=11)
I have calculated the accuracy and the confusion matrix using the gist features and the values of paramters used are: param.imageSize = [256 256], param.orientationsPerScale = [8 8 8 8], param.numberBlocks = 4, param.fc_prefilt = 4 in get_gist_features,k=11. Following are the results using this parameters.
I have used the following algorithm for scene recognition using GIST features and svm for classification.
1. Extract GIST for each image and use them directly as features.
2. Obtain such features for each training and testing image. Now these are to used as Features for the next steps.
3. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence.
I have calculated the accuracy and the confusion matrix using the gist features model and the values of the paramaters used are: param.imageSize = [256 256], param.orientationsPerScale = [8 8 8 8], param.numberBlocks = 4, param.fc_prefilt = 4 in get_gist_features,lambda=0.0001. Following are the results using this parameters.
I have used the following algorithm for scene recognition using GIST+Bag of Soft Spatial Sift features and k-nearest neighbors for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 4 and size equal to 16, fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16,fast) and then construct Histogram of SIFT features by voting a weight inversely proportional to the distance from each cluster center to that particular cluster for each SIFT feature. Construct this histogram by using a sample of features obtained from that particular image(here sampled all the feature vectors from each image for constructing the histogram of that particular image for good performance).
3. level-1: Now repeat the step2 by dividing the image into 2x2 and extract histograms for each sub-part of the image, so we will obtain 4 additional histograms now.
4. level-2: Now again repeat step2 by dividing the original image into 4x4 and extract hostograms for each of these sub-parts. We will have 16 histograms at this level.
5. two ways to implement this: use level-0 and level-1 features: concatenate them and use them as features
(or) use level-0, level-1 and level-2 features: concatenate them and then use them as features.
6. Obtain such histogram for each training and testing image and concatenate them with GIST features. Now these are to used as Features for the next steps.
7. For each test image, obtain the feature of the image as described above and used K-nearest neighbors and assign the majority label among the K-nearest neighbors found using the euclidean distance metric. (used K=11)
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_soft_gist_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast and sampled all features, used vote=10000/distance, used level=2 for get_spatial_bags_of_sifts_soft using a vocabulary of size 1000 and value of the parameter K=11 in K-nearest neighbors. For gist paramters used are: param.imageSize = [256 256], param.orientationsPerScale = [8 8 8 8], param.numberBlocks = 4, param.fc_prefilt = 4 in get_gist_features. Following are the results using this parameters.
I have used the following algorithm for scene recognition using GIST+Bag of Soft Spatial Sift features and SVM for classification.
1. Extract SIFT features for each training image (used parameter values: step equal to 8 and size equal to 16,fast). Using all the SIFT features (used all the samples here) for each image and construct a vocabulary of vocab_size using kmeans. This is a one-time operation and the vocabulary is saved then for later runs.
2. Extract SIFT features for each training and testing image (used paramaters: step equal to 4, size equal to 16, fast) and then construct Histogram of SIFT features by voting a weight inversely proportional to the distance from each cluster center to that particular cluster for each SIFT feature. Construct this histogram by using a sample of features obtained from that particular image(here sampled all the feature vectors from each image for constructing the histogram of that particular image for good performance).
3. Obtain such histogram for each training and testing image and concatenate them with GIST features. Now these are to used as Features for the next steps.
4. Using the features of the training data, train a 1-vs-all linearSVMs for each category which will be later used while testing.
5. For each test image, obtain the features of the images as described above and used 1-vs-all linear SVMs for each category separately and choose the label with maximum confidence.
I have calculated the accuracy and the confusion matrix using the bags_sift_spatial_soft_gist_features model and the values of the paramaters used: step equal to 4, size equal to 16,fast, sampled all the features and level=2, vote=10000/distance for get_spatial_bags_of_sifts_soft using a vocabulary of size 1000 and value of the parameter lambda = 0.00001 in Linear SVM. For gist paramters used are: param.imageSize = [256 256], param.orientationsPerScale = [8 8 8 8], param.numberBlocks = 4, param.fc_prefilt = 4 in get_gist_features. Following are the results using this parameters.
All the work that has been implemented for the project has been presented and discussed above. Feel free to contact me for further queries.
Contact details:
Murali Raghu Babu Balusu
GaTech id: 903241955
Email: b.murali@gatech.edu
Phone: (470)-338-1473
Thank you!