In this paper, the authors propose a novel approach to utilizing unsupervised data in a partially labeled setting, where only a small portion of the data has labels. The main challenge is how to effectively use the newer, unlabeled data to improve the performance of the Naive approach, which simply waits for more labels to become available before updating its parameters.
The authors experiment with different variations of Self-Supervised Learning (SSL) methods, including Deep Metric Learning Family (MoCo, SimCLR, and NNCLR), Self-Distillation (BYOL and SimSIAM), and Canonical Correlation Analysis (VICReg, BarlowTwins, SWAV, and W-MSE). They find that the most effective approach is to alternate between SSL methods and the Naive approach in an iterative manner, optimizing the contrastive loss on the unlabeled data and the supervised labels.
To simplify the concept, the authors use an analogy of a chef preparing a meal. Imagine the partially labeled data as a recipe with some ingredients already added (the supervised labels). The Naive approach is like a chef who waits for more ingredients to be added before continuing to cook (i.e., updating the model parameters). However, the authors propose using an SSL method to speed up the process by adding additional ingredients (the unlabeled data) and then iteratively refining the recipe with the Naive approach until it’s perfect.
The authors also conduct hyperparameter tuning on the first 10K iterations and find that the best combination of parameters leads to an average online accuracy of 9.2% for the CLOC metric. They also show that increasing the number of iterations improves the performance, with an average online accuracy of 9.6% after 100K iterations.
In summary, the authors propose a novel approach to utilizing unsupervised data in a partially labeled setting by alternating between SSL methods and the Naive approach in an iterative manner. They demonstrate the effectiveness of their approach through experiments on several SSL methods and show that it leads to improved performance compared to the Naive approach alone.
Computer Science, Machine Learning