In this paper, we propose a framework for efficient semantic segmentation using a lightweight student network that is trained by full-stage knowledge distillation from a teacher network. The goal is to achieve highly efficient SAM (Segmentation Anywhere Model) with minimal computational cost and memory usage while maintaining high accuracy.
To achieve this, we introduce an online hard prompt sampling method to mine the hard knowledge from the teacher network to the student network. This helps to activate the distillation process and ensure that the student network learns the most important features from the teacher network.
We also adapt a post-training quantization method to the segmentation task, which reduces the precision of the weights and activations in the student network while maintaining the accuracy. This further reduces the computational cost and memory usage of the lightweight student network.
Finally, we propose a hierarchical everything inference mode that allows the lightweight student network to avoid redundant computation by only performing segmentation for the objects that are actually present in the image. This results in a significant speedup of the inference time without compromising the accuracy.
In summary, our proposed framework enables highly efficient SAM using a lightweight student network that is trained by full-stage knowledge distillation and post-training quantization. The hierarchical everything inference mode further accelerates the inference time without sacrificing accuracy, making it an ideal solution for real-world applications where computational resources are limited.
Computer Science, Computer Vision and Pattern Recognition