In this article, the authors propose a novel approach to efficient image retrieval using visual vocabulary construction. The proposed method is designed to reduce the computational complexity of image retrieval by constructing a compact image descriptor that captures the essential features of an image. The authors use a combination of feature extraction techniques and a trained k-means classifier to cluster the extracted features into 64 clusters, which form the visual vocabulary. They then calculate a VLAD matrix for each reference image, which is used to map the query image to the closest reference image in the vocabulary. The proposed method demonstrates improved performance in city-scale urban area compared to previous works.
Visual Vocabulary Construction
The authors start by explaining that visual vocabulary construction is an essential step in efficient image retrieval. They describe it as a process of compressing high-dimensional feature descriptors into a lower-dimensional matrix, which is referred to as a compact image descriptor. The goal is to reduce the computational complexity of image retrieval while maintaining its accuracy.
Feature Extraction
The authors explain that there are various feature detection algorithms available, such as ORB, SIFT, and Dense RootSIFT. They choose ORB for this study because it is efficient and accurate. They extract multiple feature descriptors from each image using the ORB algorithm and assign them to one of 64 clusters using a trained k-means classifier.
Calculating VLAD Matrix
The authors explain that each reference image has a VLAD matrix that represents the distribution of features in that image. They calculate the VLAD matrix for each reference image by summing the residual errors allocated to each cluster and normalizing it to have zero mean and unit variance.
Mapping Query Image to Vocabulary
The authors explain that they map the query image to the closest reference image in the visual vocabulary using the VLAD matrix. They calculate the distance between the query image and each reference image in the vocabulary using the cosine similarity between their VLAD matrices. The reference image with the smallest distance is selected as the closest match.
Improved Performance
The authors demonstrate improved performance in city-scale urban area compared to previous works using the proposed method. They achieve better retrieval performance by reducing the computational complexity of image retrieval while maintaining its accuracy.
In conclusion, the article presents a novel approach to efficient image retrieval using visual vocabulary construction. The proposed method reduces the computational complexity of image retrieval while maintaining its accuracy. The authors demonstrate improved performance in city-scale urban area compared to previous works using their proposed method.
Computer Science, Computer Vision and Pattern Recognition