In this article, the authors propose a new approach to semantic image segmentation called Rethinking Atrous Convolution (RAC). They aim to improve the performance of convolutional neural networks (CNNs) in this task by rethinking the traditional atrous convolution method.
Atrous convolution is a technique used in CNNs to increase the resolution of the features extracted from an image. However, it has some limitations, such as losing spatial information and increasing the computational cost. The authors propose to address these limitations by using a new method called multi-scale dilated convolution (MSDC).
MSDC combines multiple dilations with different scales to capture features at different resolutions. This allows the network to learn more robust representations of the image, which improves the accuracy of semantic segmentation. The authors show that MSDC outperforms traditional atrous convolution in several experiments.
Another important contribution of the article is the introduction of a new metric called spatial attention score (SAS). SAS measures the attention of a feature map to the target class, which helps to identify the most relevant regions of the image for semantic segmentation. The authors show that using SAS with MSDC improves the performance of semantic segmentation.
The article also discusses some other techniques used in recent years to improve semantic segmentation, such as metric learning and prototype-based learning. The authors demonstrate that these techniques are related to MSDC and can be seen as special cases of their proposal.
In summary, the article proposes a new approach to semantic image segmentation called RAC, which combines multi-scale dilated convolution with spatial attention score. This approach improves the performance of CNNs in this task by addressing the limitations of traditional atrous convolution. The article also discusses other recent techniques for semantic segmentation and demonstrates their relationship to the proposed method.
Computer Science, Computer Vision and Pattern Recognition