Optimal Hashing-Based Time-Space Trade-Offs for Approximate Near Neighbors

The algorithm works by first partitioning the dataset into smaller subsets based on their distance from the query. We then recursively apply the same process to each subset, only storing the closest points to the query in a coreset. This allows us to efficiently search for the nearest neighbor without having to examine all the points in the dataset.
We prove that our algorithm has a guarantee of finding the nearest neighbor within a radius of $R$ with high probability, provided that the dimension d is at most $O(\log^2 n \log \log \Phi)$ where $\Phi$ is the maximum aspect ratio of the dataset and $n$ is the number of points. We also show that our algorithm has a worst-case time complexity of $O(nd\log^2 n)$, which is much faster than existing algorithms for nearest neighbor search in high dimensions.
The key insight behind our algorithm is that by recursively hashing the dataset into smaller subsets, we can efficiently search for the nearest neighbor without having to examine all the points in the dataset. This allows us to find the nearest neighbor within a radius of $R$ with high probability, even when the dimension is very large.
In summary, our algorithm provides a fast and efficient way to perform nearest neighbor search in high-dimensional spaces by recursively hashing the dataset into smaller subsets and only storing the closest points to each query in a coreset. We prove that our algorithm has a guarantee of finding the nearest neighbor within a radius of $R$ with high probability, and we demonstrate its effectiveness through experimental results.

ARXIV/2401.02562 authored by Moses Charikar, Michael Kapralov, Erik Waingarten.

Optimal Hashing-Based Time-Space Trade-Offs for Approximate Near Neighbors

LLama 2 7B Chat

Categories

Tags

Archives

Optimal Hashing-Based Time-Space Trade-Offs for Approximate Near Neighbors

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives