In this paper, the authors propose a novel approach to 3D object detection using sparse convolutional neural networks (CNNs). The proposed method is designed to handle objects with varying shapes and sizes within their bounding boxes, as well as smaller objects at long ranges characterized by highly incomplete shapes and structures.
To address these challenges, the authors use a pre-trained semantic segmenter called DeepLabV3 to encode the raw image into an image feature map, followed by a 3D sparse convolution operation that generates a sparse encoding map. This map consists of voxel features and indices that represent the locations of objects in the point cloud coordinate system.
The authors propose a novel technique called submanifold sparse convolution, which enables the network to learn a compact representation of objects while reducing computational complexity. This is achieved by applying a sparse transformation to the input data, followed by a series of convolutional layers that operate only on non-empty voxels.
The proposed method is evaluated on several benchmark datasets, including NYUv2 and ModelNet40, and shows superior performance compared to existing methods. The authors also demonstrate the effectiveness of their approach in real-world applications such as autonomous driving and robotics.
In summary, this paper introduces a novel approach to 3D object detection using sparse convolutional neural networks, which enables the network to learn a compact representation of objects while reducing computational complexity. The proposed method shows superior performance compared to existing methods and has promising applications in real-world scenarios.
Computer Science, Computer Vision and Pattern Recognition