Efficient Model Training through Consistency Self-distillation

In this article, we explore a novel approach to addressing the challenges of model variance in deep neural networks. Model variance refers to the differences in performance across different models or instances of the same model, even when they are trained on the same data. This problem is particularly pronounced in long-tailed recognition tasks, where the minority classes have limited training examples.
To tackle this issue, we propose a new method called Consistency Self-distillation (CS). Our approach involves distilling richer knowledge from a normal image to a distorted version of the same image. We first create an original image and two different views of it using weak and strong augmentations, respectively. Then, we use diversity softmax to compute probabilities for each view and compute the consistency loss between them.
The key insight behind CS is that the minority classes can help the majority classes learn more robust features. By distilling richer knowledge from the normal image to the distorted version, we can improve the performance of all classes in the long run. We evaluate our method on several benchmark datasets and show that it significantly improves the recognition accuracy compared to existing methods.
In summary, our article introduces a new approach to addressing model variance in deep neural networks, particularly in long-tailed recognition tasks. Our Consistency Self-distillation (CS) method distills richer knowledge from normal images to distorted versions of the same image, leveraging the minority classes to help the majority classes learn more robust features. We demonstrate the effectiveness of our approach through experiments on several benchmark datasets. By improving the recognition accuracy for all classes, CS provides a promising solution to the challenges of model variance in deep learning.

ARXIV/2308.09922 authored by Qihao Zhao, Chen Jiang, Wei Hu, Fan Zhang, Jun Liu.

Efficient Model Training through Consistency Self-distillation

LLama 2 7B Chat

Categories

Tags

Archives

Efficient Model Training through Consistency Self-distillation

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives