In this article, we explore a novel approach to addressing the challenges of model variance in deep neural networks. Model variance refers to the differences in performance across different models or instances of the same model, even when they are trained on the same data. This problem is particularly pronounced in long-tailed recognition tasks, where the minority classes have limited training examples.
To tackle this issue, we propose a new method called Consistency Self-distillation (CS). Our approach involves distilling richer knowledge from a normal image to a distorted version of the same image. We first create an original image and two different views of it using weak and strong augmentations, respectively. Then, we use diversity softmax to compute probabilities for each view and compute the consistency loss between them.
The key insight behind CS is that the minority classes can help the majority classes learn more robust features. By distilling richer knowledge from the normal image to the distorted version, we can improve the performance of all classes in the long run. We evaluate our method on several benchmark datasets and show that it significantly improves the recognition accuracy compared to existing methods.
In summary, our article introduces a new approach to addressing model variance in deep neural networks, particularly in long-tailed recognition tasks. Our Consistency Self-distillation (CS) method distills richer knowledge from normal images to distorted versions of the same image, leveraging the minority classes to help the majority classes learn more robust features. We demonstrate the effectiveness of our approach through experiments on several benchmark datasets. By improving the recognition accuracy for all classes, CS provides a promising solution to the challenges of model variance in deep learning.
Computer Science, Computer Vision and Pattern Recognition