In this article, we explore a new approach to segmenting large-scale point clouds into meaningful regions using weak supervision. Traditionally, semantic segmentation requires detailed annotations of each point in the cloud, which can be time-consuming and expensive to obtain. Our proposed method leverages the Transformer architecture to capture long-range dependencies and improve completion of object segments, even when only partially labeled.
To address the challenge of weak supervision, we introduce a novel approach that combines scene-level labels with pseudo labels generated using Class Activation Maps (CAM) or Multiple Instance Learning (MIL). These pseudo labels are used to train a self-attention module in the Transformer, which enables it to capture long-range dependencies and improve object completion.
Our proposed method, called AnMIL-DerivedTransformer, consists of two main components: a scene-level MIL loss function and a self-training process using pseudo labels. The MIL loss encourages the model to produce accurate predictions for points in the same scene, while the self-training process refines the predictions by leveraging the pseudo labels.
We evaluate our method on several benchmark datasets and demonstrate its effectiveness in improving segmentation accuracy compared to existing weakly supervised methods. Our results show that AnMIL-DerivedTransformer can produce high-quality semantic segments even with only partial annotations, making it a valuable tool for point cloud segmentation tasks.
In summary, this article presents a novel approach to weakly supervised semantic segmentation of large-scale point clouds using the Transformer architecture and a combination of scene-level labels and pseudo labels. Our proposed method demonstrates improved accuracy compared to existing methods and has the potential to significantly reduce annotation costs in point cloud segmentation tasks.
Computer Science, Computer Vision and Pattern Recognition