Incorporating Domain Knowledge for Efficient Object Detection
Object detection is a fundamental task in computer vision, and efficient algorithms are crucial for real-world applications. Recently, there has been a growing interest in using domain knowledge to improve object detection performance. This article presents a novel approach that leverages domain knowledge to omit intervals shorter than 2 seconds, which account for only 2.8% of the annotations, reducing the processing time by half.
The authors propose a three-stage pipeline
- Proposal Generation: In this stage, the input event representations (histograms or time maps) are resized to 224×224 pixels and passed through a CNN to generate proposals. The authors use the robust minimum and maximum, defined by the p-th and (100-p)-th percentile, and clip r(t) outside these values.
- Interval Merging: In this stage, the intervals of high event rate (i.e., activity) are used as proposals using the watershed algorithm. The regions >λ are used as proposals, and they are merged iteratively based on a merging threshold µ.
- Proposal Prediction: In this stage, the proposals are augmented by adding "start" and "end" stages, and the ATSN (Augmentation-based Temporal Segmentation Network) works on the proposals to predict the proposal prediction.
The authors evaluate their approach using independent experiments for the proposal stage and the whole two-stage pipeline, and report the average recall (AR) for tIoU = {0.1, 0.3, 0.5, 0.7} using the best Np proposals per recording and nest, with Np ∈ {20, 30, 50}. They also report temporal mean average precision (mAP) at the same tIoU values as for the AR metric.
The results show that their approach outperforms the state-of-the-art method [47] in terms of both recall and processing time. The authors demonstrate that incorporating domain knowledge can significantly improve object detection performance while reducing the computational complexity.
In conclusion, this article presents a novel approach for efficient object detection by leveraging domain knowledge to omit irrelevant intervals shorter than 2 seconds. By using a three-stage pipeline and augmenting proposals with "start" and "end" stages, the authors achieve better performance while reducing the processing time. This work has important implications for real-world applications where efficiency is crucial, such as autonomous driving, surveillance, and robotics.