In this paper, we investigate the sample complexity of learning Mahalanobis distance metrics using contrastive learning. We provide upper and lower bounds on the number of samples required to learn a good representation, depending on the dimension of the representation (d) and the number of samples (n). Our main result is that the sample complexity for contrastive learning is O(n^2/α^2) when the dimension of the representation is high, where α is the ratio of the number of positive to negative samples.
To understand this result, imagine a crowd of people labeled with different distances from a reference point (x). The goal of contrastive learning is to learn a representation that can accurately predict the distance between any two people in the crowd based on their proximity to the reference point. Just like how it takes time and effort to collect labeled data, learning a good representation requires a sufficient number of samples. Our bounds show that the sample complexity depends on the dimension of the representation and the ratio of positive to negative samples.
We also consider the case where the sample complexity corresponds directly to the cost of labeling, making contrastive learning a directly supervised process. In this scenario, the sample complexity is O(n log n), which is simpler than the previous bound but still depends on the number of samples and the dimension of the representation.
Overall, our results provide a better understanding of the sample complexity of contrastive learning and its dependence on various factors, including the dimension of the representation and the ratio of positive to negative samples. By demystifying these complex concepts with everyday language and analogies, we hope to make contrastive learning more accessible and easier to understand for researchers and practitioners in the field.
Computer Science, Machine Learning