In this article, we discuss a critical problem in contrastive distillation: how to effectively transfer teacher’s knowledge to student when designing a contrastive distillation method for semantic segmentation. We highlight two key technical issues that need to be addressed: high resource demands and structured knowledge transfer.
High Resource Demands
Contrastive distillation involves augmenting samples or storing feature maps in memory buffer, which incurs additional computational cost. For instance, feeding augmented samples into the student model increases its forward pass computational cost. Moreover, producing high-resolution output feature maps for semantic segmentation requires originally preserving them in memory buffer, leading to a significant increase in memory occupation.
Structured Knowledge Transfer
To effectively transfer teacher’s knowledge to student, we need to model relations among pixel-wise or more fine-grained representations within each local patch. This is crucial because contrastive distillation methods have not explicitly addressed this issue. We propose addressing this problem through Af-DCD (Augmented Feature Distillation with Dense and Structured Contrastive Distillation), which combines augmentation and dense contrastive distillation to improve the student model’s performance in semantic segmentation.
Our proposed method, Af-DCD, consists of two stages: (1) feature augmentation, where we apply random transformations to the input images, and (2) dense contrastive distillation, where we use a contrastive loss function to encourage the student model to mimic the teacher model’s features. By combining these two stages, Af-DCD enables the student model to learn more robust and accurate features through both augmented feature distillation and dense contrastive distillation.
We evaluate our proposed method on several datasets, including Cityscapes, PASCAL VOC, and COCO. Our results demonstrate that Af-DCD outperforms other state-of-the-art contrastive distillation methods in semantic segmentation tasks, indicating its effectiveness in addressing the two technical issues mentioned earlier.
In summary, this article presents a novel approach to contrastive distillation called Af-DCD, which addresses two key challenges in transferring teacher’s knowledge to student: high resource demands and structured knowledge transfer. By combining augmentation and dense contrastive distillation, Af-DCD enables the student model to learn more robust and accurate features, leading to improved performance in semantic segmentation tasks.