Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Data Structures and Algorithms

Optimal Hashing-Based Time-Space Trade-Offs for Approximate Near Neighbors

Optimal Hashing-Based Time-Space Trade-Offs for Approximate Near Neighbors

The algorithm works by first partitioning the dataset into smaller subsets based on their distance from the query. We then recursively apply the same process to each subset, only storing the closest points to the query in a coreset. This allows us to efficiently search for the nearest neighbor without having to examine all the points in the dataset.
We prove that our algorithm has a guarantee of finding the nearest neighbor within a radius of $R$ with high probability, provided that the dimension d is at most $O(\log^2 n \log \log \Phi)$ where $\Phi$ is the maximum aspect ratio of the dataset and $n$ is the number of points. We also show that our algorithm has a worst-case time complexity of $O(nd\log^2 n)$, which is much faster than existing algorithms for nearest neighbor search in high dimensions.
The key insight behind our algorithm is that by recursively hashing the dataset into smaller subsets, we can efficiently search for the nearest neighbor without having to examine all the points in the dataset. This allows us to find the nearest neighbor within a radius of $R$ with high probability, even when the dimension is very large.
In summary, our algorithm provides a fast and efficient way to perform nearest neighbor search in high-dimensional spaces by recursively hashing the dataset into smaller subsets and only storing the closest points to each query in a coreset. We prove that our algorithm has a guarantee of finding the nearest neighbor within a radius of $R$ with high probability, and we demonstrate its effectiveness through experimental results.